Before performing shift-based methods, because row order matters, it is important to ensure the rows are sorted in the proper order (usually in ascending order by date).
# sorting by year for good measure:gdp_df.sort_values(by=["year"], ascending=True, inplace=True)gdp_df.head()
year
gdp
0
1990
100
1
1991
105
2
1992
110
3
1993
115
4
1994
110
We can use the dataframe’s shift method to reference a corresponding value in another row above or below by specifying the number of rows above or below via the periods parameter.
We use positive numbers to reference rows above, and negative numbers to reference cells below:
gdp_df["gdp"].shift(periods=1) # 1 or -1 depending on order
Even though we are able to perform this growth calculation ourselves, we should know the DataFrame has a dedicated pct_change method for this purpose, which allows us to skip the intermediate steps:
# equivalent, leveraging the pct_change method:gdp_df["gdp_pct_change"] = gdp_df["gdp"].pct_change(periods=1)gdp_df[["year", "gdp", "gdp_pct_change"]]
year
gdp
gdp_pct_change
0
1990
100
NaN
1
1991
105
0.050000
2
1992
110
0.047619
3
1993
115
0.045455
4
1994
110
-0.043478
11.2 Cumulative Growth
Alright, we have studied how to calculate growth from one period to another, but what about calculating cumulative growth over the entire time period?
To calculate cumulative growth for a particular period, we can use thecumprod method (or sometimes the product method, depending on the use case). When calculating the cumulative product, each value gets multiplied by the values that follow, in succession.
Before we calculate a product, to make the multiplication work, we’ll first need to express the growth numbers relative to 1, instead of 0. We’ll also need to fill in the initial null value with a 1, so the first period represents 100%.
Let’s break this down one step at a time, to illustrate each method, before putting them all together at the end.
First we overwrite the initial null value that results from there being no previous row for the first row:
gdp_df.loc[0, "gdp_pct_change"] =0gdp_df
year
gdp
gdp_prev
gdp_change
gdp_pct_change
0
1990
100
NaN
NaN
0.000000
1
1991
105
100.0
5.0
0.050000
2
1992
110
105.0
5.0
0.047619
3
1993
115
110.0
5.0
0.045455
4
1994
110
115.0
-5.0
-0.043478
Then we express growth relative to one instead of zero (so we can calculate cumulative product later):