</div>

# BASICS¶

### Contributors:¶

• Juan Carlos Basto Pineda (juan.basto.pineda@gmail.com)

Adapted by Juan Carlos Basto Pineda from the very nice lesson on Plotting and Programming in Python by Software Carpentry.
Licensed under CC-BY 4.0 2018–2021 by The Carpentries
Licensed under CC-BY 4.0 by Software Carpentry Foundation

I strongly recommend you to visit the full lesson later on.

## Reading Tabular Data into DataFrames¶

Let's start with a quick view of some sampled Gross Domestic Product data.

### DataFrame.index and DataFrame.columns can be redefined as necessary¶

Check the following commands

### You can get rid of rows or columns you don't need with DataFrame.drop¶

How would you get a new dataframe only containning data for New Zealand after 2000, and labeling the row with the index NZ, and labeling the columns just with the year?

### Use DataFrame.describe() to get summary statistics about data.¶

Gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument include='all'

• Read the data in gapminder_gdp_americas.csv into a variable called americas and display its summary statistics.
• Check the commands americas.shape, americas.head(), americas.tail() What do they do?
• Drop some rows to retain only those countries in Latin America from where there are Universities participating in LACoNGA-Physics.
• Store the reduced dataframe witht he name gdp_LACoNGA.csv, checking the help of the pd.DataFrame.to_csv command first. You can do it typing help(americas.to_csv)

## Accesing and modifying data¶

Let's take a look at the data from European countries

What do you think that the command DataFrame.iloc does after checking the following commands?

We can access data by the labels as well. Remember that row labels are given by data.index and column labels by data.columns

What do you think that the command DataFrame.loc does after checking the following commands?

### Use : on its own to mean all columns or all rows.¶

The following line serves to extract a full row based on its index

Note that you would get the same result printing data.loc["Albania"] (without a second index), or simply data["Albania"]

In case you want data from a given column, change the position of the :

Or you can access it just by using square brackets and column name, or using a dot ., without the need for .loc or .iloc commands:

### Modifying data¶

• The same command you use to check a value or a range of values can be used to substitute data, just using = and providing a value or an array with the right shape
• For instance, data.loc["Albania", "gdpPercap_1952"] = 0
• data.iloc[0:3,0] = [1,2,3]
• To add a new column simply invoke it with the desired name and making the values assignation:
• data['new_column'] = [array of values with the right shape]

### Exercise:¶

• Create a copy of the DataFrame
• Change the first value of the GDP for the first country to 5
• Replace all numbers in a column by 1's
• Add a new column with 0's

### Result of slicing can be used in further operations.¶

All the statistical operators that work on entire dataframes work the same way on slices.
E.g., let's find the maximum at certain columns (other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy)

Other well-known functions like min, median, mean, std... are availbale, as pandas is built on top of numpy

### Use comparisons to select data based on value.¶

In which of the following countries/years was the GDP larger than 10.000?

### The concept of Boolean masks.¶

• Get the value where the mask is true, and NaN (Not a Number) where it is false.
• Useful because NaNs are ignored by operations like max, min, average, etc.

### Aggregate method to apply a mathematical function along rows or columns¶

What do you think the following commands are doing?

### DataFrame.groupby to apply a math function to subsets of data according to a given parameter¶

What was the total contribution of the countries in each category of wealth score in each year?

### Exercise¶

• Create a groupby object according to the wealth score
• For each group, calculate the mean GDP in 2007
• For each country, calculate the percentage of its contribution to the total GDP of tis group in 2007
• For each group, calculate the total contribution to the GDP aling all years