Shift-based Methods

So far we have been performing operations using values within the same row, or by operating on all values in a given column.

But what if we want to compare a value in one row to corresponding values in the rows below or above?

We can use "shift-based" methods for this purpose.

Let’s consider this simple example time series dataset:

from pandas import DataFrame

gdp_df = DataFrame([
    {"year": 1990, "gdp": 100},
    {"year": 1991, "gdp": 105},
    {"year": 1992, "gdp": 110},
    {"year": 1993, "gdp": 115},
    {"year": 1994, "gdp": 110}

])
gdp_df.head()
year gdp
0 1990 100
1 1991 105
2 1992 110
3 1993 115
4 1994 110

Before performing shift-based methods, because row order matters, it is important to ensure the rows are sorted in the proper order (usually in ascending order by date).

# sorting by year for good measure:
gdp_df.sort_values(by=["year"], ascending=True, inplace=True)
gdp_df.head()
year gdp
0 1990 100
1 1991 105
2 1992 110
3 1993 115
4 1994 110

We can use the dataframe’s shift method to reference a corresponding value in another row above or below by specifying the number of rows above or below via the periods parameter.

We use positive numbers to reference rows above, and negative numbers to reference cells below:

gdp_df["gdp"].shift(periods=1) # 1 or -1 depending on order
0      NaN
1    100.0
2    105.0
3    110.0
4    115.0
Name: gdp, dtype: float64
gdp_df["gdp"].shift(periods=-1) # 1 or -1 depending on order
0    105.0
1    110.0
2    115.0
3    110.0
4      NaN
Name: gdp, dtype: float64

Growth and Percent Change

By comparing current against previous values, this allows us to perform an ad-hoc calculation of percent growth from one period to the next.

Here we illustrate the methods one step at a time.

First we create a new column of previous values, using the shift method:

# creating a new column using previous values:
gdp_df["gdp_prev"] = gdp_df["gdp"].shift(periods=1)
gdp_df
year gdp gdp_prev
0 1990 100 NaN
1 1991 105 100.0
2 1992 110 105.0
3 1993 115 110.0
4 1994 110 115.0

Next we calculate change from one period to the next, by subtracting the previous values from the current values:

# calculating change:
gdp_df["gdp_change"] = gdp_df["gdp"] - gdp_df["gdp_prev"]
gdp_df
year gdp gdp_prev gdp_change
0 1990 100 NaN NaN
1 1991 105 100.0 5.0
2 1992 110 105.0 5.0
3 1993 115 110.0 5.0
4 1994 110 115.0 -5.0

Finally we calculate percent change by dividing the change over the previous value:

# calculating percent change:
#gdp_df["gdp_growth"] = (gdp_df["gdp"] - gdp_df["gdp_prev"]) / gdp_df["gdp_prev"]

gdp_df["gdp_pct_change"] = gdp_df["gdp_change"] / gdp_df["gdp_prev"]
gdp_df
year gdp gdp_prev gdp_change gdp_pct_change
0 1990 100 NaN NaN NaN
1 1991 105 100.0 5.0 0.050000
2 1992 110 105.0 5.0 0.047619
3 1993 115 110.0 5.0 0.045455
4 1994 110 115.0 -5.0 -0.043478

The pct_change Method

Even though we are able to perform this growth calculation ourselves, we should know the dataframe has a dedicated pct_change method for this purpose, which allows us to skip the intermediate steps:

# equivalent, leveraging the pct_change method:
gdp_df["gdp_pct_change"] = gdp_df["gdp"].pct_change(periods=1)

gdp_df[["year", "gdp", "gdp_pct_change"]]
year gdp gdp_pct_change
0 1990 100 NaN
1 1991 105 0.050000
2 1992 110 0.047619
3 1993 115 0.045455
4 1994 110 -0.043478

Cumulative Growth

Alright, we have studied how to calculate growth from one period to another, but what about calculating cumulative growth over the entire time period?

To calculate cumulative growth for a particular period, we can use the(cumprod method (or sometimes the product method, depending on the use case). When calculating the cumulative product, each value gets multiplied by the values that follow, in succession.

Before we calculate a product, to make the multiplication work, we’ll first need to express the growth numbers relative to 1, instead of 0. We’ll also need to fill in the initial null value with a 1, so the first period represents 100%.

Let’s break this down one step at a time, to illustrate each method, before putting them all together at the end.

First we overwrite the initial null value that results from there being no previous row for the first row:

gdp_df.loc[0, "gdp_pct_change"] = 0

gdp_df
year gdp gdp_prev gdp_change gdp_pct_change
0 1990 100 NaN NaN 0.000000
1 1991 105 100.0 5.0 0.050000
2 1992 110 105.0 5.0 0.047619
3 1993 115 110.0 5.0 0.045455
4 1994 110 115.0 -5.0 -0.043478

Then we express growth relative to one instead of zero (so we can calculate cumulative product later):

gdp_df["gdp_pct_change"] = gdp_df["gdp_pct_change"] + 1

Finally, we calculate the cumulative product of the growth:

#gdp_df["cumulative_growth"] = (gdp_df["gdp_growth"] + 1).cumprod()

gdp_df["gdp_cumulative_growth"] = gdp_df["gdp_pct_change"].cumprod()

gdp_df
year gdp gdp_prev gdp_change gdp_pct_change gdp_cumulative_growth
0 1990 100 NaN NaN 1.000000 1.00
1 1991 105 100.0 5.0 1.050000 1.05
2 1992 110 105.0 5.0 1.047619 1.10
3 1993 115 110.0 5.0 1.045455 1.15
4 1994 110 115.0 -5.0 0.956522 1.10

We can see how pandas makes it easy to calculate growth and cumulative growth.