Shift-based Methods

So far we have been performing operations using values within the same row, or by operating on all values in a given column.

But what if we want to compare a value in one row to corresponding values in the rows below or above?

We can use "shift-based" methods for this purpose.

Let’s consider this simple example time series dataset:

from pandas import DataFrame

gdp_df = DataFrame([
    {"year": 1990, "gdp": 100},
    {"year": 1991, "gdp": 105},
    {"year": 1992, "gdp": 110},
    {"year": 1993, "gdp": 115},
    {"year": 1994, "gdp": 110}

])
gdp_df.head()

	year	gdp
0	1990	100
1	1991	105
2	1992	110
3	1993	115
4	1994	110

Before performing shift-based methods, because row order matters, it is important to ensure the rows are sorted in the proper order (usually in ascending order by date).

# sorting by year for good measure:
gdp_df.sort_values(by=["year"], ascending=True, inplace=True)
gdp_df.head()

	year	gdp
0	1990	100
1	1991	105
2	1992	110
3	1993	115
4	1994	110

We can use the dataframe’s shift method to reference a corresponding value in another row above or below by specifying the number of rows above or below via the periods parameter.

We use positive numbers to reference rows above, and negative numbers to reference cells below:

gdp_df["gdp"].shift(periods=1) # 1 or -1 depending on order

0      NaN
1    100.0
2    105.0
3    110.0
4    115.0
Name: gdp, dtype: float64

gdp_df["gdp"].shift(periods=-1) # 1 or -1 depending on order

0    105.0
1    110.0
2    115.0
3    110.0
4      NaN
Name: gdp, dtype: float64

Growth and Percent Change

By comparing current against previous values, this allows us to perform an ad-hoc calculation of percent growth from one period to the next.

Here we illustrate the methods one step at a time.

First we create a new column of previous values, using the shift method:

# creating a new column using previous values:
gdp_df["gdp_prev"] = gdp_df["gdp"].shift(periods=1)
gdp_df

	year	gdp	gdp_prev
0	1990	100	NaN
1	1991	105	100.0
2	1992	110	105.0
3	1993	115	110.0
4	1994	110	115.0

Next we calculate change from one period to the next, by subtracting the previous values from the current values:

# calculating change:
gdp_df["gdp_change"] = gdp_df["gdp"] - gdp_df["gdp_prev"]
gdp_df

	year	gdp	gdp_prev	gdp_change
0	1990	100	NaN	NaN
1	1991	105	100.0	5.0
2	1992	110	105.0	5.0
3	1993	115	110.0	5.0
4	1994	110	115.0	-5.0

Finally we calculate percent change by dividing the change over the previous value:

# calculating percent change:
#gdp_df["gdp_growth"] = (gdp_df["gdp"] - gdp_df["gdp_prev"]) / gdp_df["gdp_prev"]

gdp_df["gdp_pct_change"] = gdp_df["gdp_change"] / gdp_df["gdp_prev"]
gdp_df

	year	gdp	gdp_prev	gdp_change	gdp_pct_change
0	1990	100	NaN	NaN	NaN
1	1991	105	100.0	5.0	0.050000
2	1992	110	105.0	5.0	0.047619
3	1993	115	110.0	5.0	0.045455
4	1994	110	115.0	-5.0	-0.043478

The `pct_change` Method

Even though we are able to perform this growth calculation ourselves, we should know the dataframe has a dedicated pct_change method for this purpose, which allows us to skip the intermediate steps:

# equivalent, leveraging the pct_change method:
gdp_df["gdp_pct_change"] = gdp_df["gdp"].pct_change(periods=1)

gdp_df[["year", "gdp", "gdp_pct_change"]]

	year	gdp	gdp_pct_change
0	1990	100	NaN
1	1991	105	0.050000
2	1992	110	0.047619
3	1993	115	0.045455
4	1994	110	-0.043478

Cumulative Growth

Alright, we have studied how to calculate growth from one period to another, but what about calculating cumulative growth over the entire time period?

To calculate cumulative growth for a particular period, we can use the(cumprod method (or sometimes the product method, depending on the use case). When calculating the cumulative product, each value gets multiplied by the values that follow, in succession.

Before we calculate a product, to make the multiplication work, we’ll first need to express the growth numbers relative to 1, instead of 0. We’ll also need to fill in the initial null value with a 1, so the first period represents 100%.

Let’s break this down one step at a time, to illustrate each method, before putting them all together at the end.

First we overwrite the initial null value that results from there being no previous row for the first row:

gdp_df.loc[0, "gdp_pct_change"] = 0

gdp_df

	year	gdp	gdp_prev	gdp_change	gdp_pct_change
0	1990	100	NaN	NaN	0.000000
1	1991	105	100.0	5.0	0.050000
2	1992	110	105.0	5.0	0.047619
3	1993	115	110.0	5.0	0.045455
4	1994	110	115.0	-5.0	-0.043478

Then we express growth relative to one instead of zero (so we can calculate cumulative product later):

gdp_df["gdp_pct_change"] = gdp_df["gdp_pct_change"] + 1

Finally, we calculate the cumulative product of the growth:

#gdp_df["cumulative_growth"] = (gdp_df["gdp_growth"] + 1).cumprod()

gdp_df["gdp_cumulative_growth"] = gdp_df["gdp_pct_change"].cumprod()

gdp_df

	year	gdp	gdp_prev	gdp_change	gdp_pct_change	gdp_cumulative_growth
0	1990	100	NaN	NaN	1.000000	1.00
1	1991	105	100.0	5.0	1.050000	1.05
2	1992	110	105.0	5.0	1.047619	1.10
3	1993	115	110.0	5.0	1.045455	1.15
4	1994	110	115.0	-5.0	0.956522	1.10

We can see how pandas makes it easy to calculate growth and cumulative growth.

Shift-based Methods

Growth and Percent Change

The pct_change Method

Cumulative Growth

The `pct_change` Method