</div>

- Juan Carlos Basto Pineda (juan.basto.pineda@gmail.com)

**Adapted** by Juan Carlos Basto Pineda from the very nice lesson on Plotting and Programming in Python by Software Carpentry.

Licensed under CC-BY 4.0 2018–2021 by The Carpentries

Licensed under CC-BY 4.0 by Software Carpentry Foundation

I strongly recommend you to visit the full lesson later on.

Let's start with a quick view of some sampled Gross Domestic Product data.

In [ ]:

```
import pandas as pd
```

In [ ]:

```
data = pd.read_csv('data/gapminder_gdp_oceania.csv')
```

In [ ]:

```
data
```

`index_col`

to specify that a column’s values should be used as row headings¶In [ ]:

```
data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
data
```

`DataFrame.info()`

method to find out more about a dataframe¶In [ ]:

```
data.info()
```

`DataFrame.index`

and labels of columns as `DataFrame.columns`

¶In [ ]:

```
data.index
```

In [ ]:

```
data.columns
```

How would you get a new dataframe only containning data for New Zealand after 2000, and labeling the row with the index `NZ`

, and labeling the columns just with the year?

In [ ]:

```
# Your Answer Here
```

`DataFrame.T`

to transpose a dataframe.¶In [ ]:

```
print(data.T)
```

`Use DataFrame.describe()`

to get summary statistics about data.¶Gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument `include='all'`

In [ ]:

```
data.describe(include='all')
```

- Read the data in
`gapminder_gdp_americas.csv`

into a variable called`americas`

and display its summary statistics.

In [ ]:

```
# Your Answer Here
```

- Check the commands
`americas.shape`

,`americas.head()`

,`americas.tail()`

What do they do?

In [ ]:

```
# Your Answer Here
```

- Drop some rows to retain only those countries in Latin America from where there are Universities participating in LACoNGA-Physics.
- Store the reduced dataframe witht he name
`gdp_LACoNGA.csv`

, checking the help of the`pd.DataFrame.to_csv`

command first. You can do it typing`help(americas.to_csv)`

In [ ]:

```
# Your Answer Here
```

Let's take a look at the data from European countries

In [ ]:

```
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
```

In [ ]:

```
data.head(2)
```

What do you think that the command `DataFrame.iloc`

does after checking the following commands?

In [ ]:

```
data.iloc[0]
```

In [ ]:

```
data.iloc[:,0]
```

In [ ]:

```
data.iloc[0,:]
```

In [ ]:

```
data.iloc[0,0]
```

We can access data by the labels as well. Remember that row labels are given by `data.index`

and column labels by `data.columns`

In [ ]:

```
data.index
```

In [ ]:

```
data.columns
```

What do you think that the command `DataFrame.loc`

does after checking the following commands?

In [ ]:

```
data.loc["Albania", "gdpPercap_1952"]
```

`:`

on its own to mean all columns or all rows.¶The following line serves to extract a full row based on its index

In [ ]:

```
data.loc["Albania", :]
```

Note that you would get the same result printing `data.loc["Albania"]`

(without a second index), or simply `data["Albania"]`

In [ ]:

```
data.loc["Albania"]
```

In case you want data from a given column, change the position of the `:`

In [ ]:

```
data.loc[:,'gdpPercap_2007']
```

Or you can access it just by using square brackets and column name, or using a dot `.`

, without the need for `.loc`

or `.iloc`

commands:

In [ ]:

```
data['gdpPercap_2007']
```

In [ ]:

```
data.gdpPercap_2007
```

- The same command you use to check a value or a range of values can be used to substitute data, just using
`=`

and providing a value or an array with the right shape - For instance,
`data.loc["Albania", "gdpPercap_1952"] = 0`

`data.iloc[0:3,0] = [1,2,3]`

- To add a new column simply invoke it with the desired name and making the values assignation:
`data['new_column'] = [array of values with the right shape]`

- Create a copy of the DataFrame
- Change the first value of the GDP for the first country to 5
- Replace all numbers in a column by 1's
- Add a new column with 0's

In [ ]:

```
# Your Answer Here
```

`DataFrame.loc`

and a named slice¶In [ ]:

```
data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
```

All the statistical operators that work on entire dataframes work the same way on slices.

E.g., let's find the maximum at certain columns (other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy)

In [ ]:

```
data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max(axis = 1)
```

Other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy

In [ ]:

```
data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].median()
```

In which of the following countries/years was the GDP larger than 10.000?

In [ ]:

```
# Use a subset of data to keep output readable.
subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
print('Subset of data:\n', subset)
```

In [ ]:

```
# Which values were greater than 10000 ?
print('\nWhere are values large?\n', subset > 10000)
```

In [ ]:

```
mask = subset > 10000
subset[mask]
```

- Get the value where the mask is true, and
`NaN`

(Not a Number) where it is false. - Useful because NaNs are ignored by operations like max, min, average, etc.

In [ ]:

```
mask_higher = data > data.mean()
wealth_score = data[mask_higher].aggregate('sum', axis=1) / len(data.columns)
data['ws'] = wealth_score
data.head()
```

`DataFrame.groupby`

to apply a math function to subsets of data according to a given parameter¶What was the total contribution of the countries in each category of wealth score in each year?

In [ ]:

```
data.groupby(by='ws').sum()
```

`groupby`

to classify and then apply a customized mathematical function with '.GroupBy.apply'¶In [ ]:

```
df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
df
```

In [ ]:

```
g = df.groupby('A')
```

In [ ]:

```
g.apply(lambda x: x / x.sum())
```

In [ ]:

```
g.apply(lambda x: x.C.max() - x.B.min())
```

- Create a
`groupby`

object according to the wealth score

In [ ]:

```
# Your Answer Here
```

- For each group, calculate the mean GDP in 2007

In [ ]:

```
# Your Answer Here
```

- For each country, calculate the percentage of its contribution to the total GDP of tis group in 2007

In [ ]:

```
# Your Answer Here
```

- For each group, calculate the total contribution to the GDP aling all years

In [ ]:

```
# Your Answer Here
```