
</div>

Adapted by Juan Carlos Basto Pineda from the very nice lesson on Plotting and Programming in Python by Software Carpentry.
Licensed under CC-BY 4.0 2018–2021 by The Carpentries
Licensed under CC-BY 4.0 by Software Carpentry Foundation
I strongly recommend you to visit the full lesson later on.
Let's start with a quick view of some sampled Gross Domestic Product data.
import pandas as pd
data = pd.read_csv('data/gapminder_gdp_oceania.csv')
data
index_col to specify that a column’s values should be used as row headings¶data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
data
DataFrame.info() method to find out more about a dataframe¶data.info()
DataFrame.index and labels of columns as  DataFrame.columns¶data.index
data.columns
How would you get a new dataframe only containning data for New Zealand after 2000, and labeling the row with the index NZ, and labeling the columns just with the year?
# Your Answer Here
DataFrame.T to transpose a dataframe.¶print(data.T)
Use DataFrame.describe() to get summary statistics about data.¶Gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument include='all'
data.describe(include='all')
gapminder_gdp_americas.csv into a variable called americas and display its summary statistics.# Your Answer Here
 
americas.shape, americas.head(), americas.tail() What do they do?# Your Answer Here
gdp_LACoNGA.csv, checking the help of the pd.DataFrame.to_csv command first. You can do it typing help(americas.to_csv)# Your Answer Here
Let's take a look at the data from European countries
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data.head(2)
What do you think that the command DataFrame.iloc does after checking the following commands?
data.iloc[0]
data.iloc[:,0]
data.iloc[0,:]
data.iloc[0,0]
We can access data by the labels as well. Remember that row labels are given by data.index and column labels by data.columns
data.index
data.columns
What do you think that the command DataFrame.loc does after checking the following commands?
data.loc["Albania", "gdpPercap_1952"]
: on its own to mean all columns or all rows.¶The following line serves to extract a full row based on its index
data.loc["Albania", :]
Note that you would get the same result printing data.loc["Albania"] (without a second index), or simply data["Albania"]
data.loc["Albania"]
In case you want data from a given column, change the position of the :
data.loc[:,'gdpPercap_2007']
Or you can access it just by using square brackets and column name, or using a dot ., without the need for .loc or .iloc commands:
data['gdpPercap_2007']
data.gdpPercap_2007
= and providing a value or an array with the right shapedata.loc["Albania", "gdpPercap_1952"] = 0data.iloc[0:3,0] = [1,2,3]data['new_column'] = [array of values with the right shape]# Your Answer Here
DataFrame.loc and a named slice¶data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
All the statistical operators that work on entire dataframes work the same way on slices.
E.g., let's find the maximum at certain columns  (other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy)
data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max(axis = 1)
Other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy
data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].median()
In which of the following countries/years was the GDP larger than 10.000?
# Use a subset of data to keep output readable.
subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
print('Subset of data:\n', subset)
# Which values were greater than 10000 ?
print('\nWhere are values large?\n', subset > 10000)
mask = subset > 10000
subset[mask]
NaN (Not a Number) where it is false.mask_higher = data > data.mean()
wealth_score = data[mask_higher].aggregate('sum', axis=1) / len(data.columns)
data['ws'] = wealth_score
data.head()
DataFrame.groupby to apply a math function to subsets of data according to a given parameter¶What was the total contribution of the countries in each category of wealth score in each year?
data.groupby(by='ws').sum()
groupby to classify and then apply a customized mathematical function with '.GroupBy.apply'¶df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
df
g = df.groupby('A')
g.apply(lambda x: x / x.sum())
g.apply(lambda x: x.C.max() - x.B.min())
groupby object according to the wealth score # Your Answer Here
# Your Answer Here
# Your Answer Here
# Your Answer Here