Grouping and Pivoting

Let’s study how to create pivot table -style reports.

In this sales dataset, you will see we have a row per date per product sold on that date.

from pandas import read_csv, to_datetime

sales_df = read_csv(f"https://raw.githubusercontent.com/prof-rossetti/python-for-finance/main/docs/data/monthly-sales.csv")
sales_df["date"] = to_datetime(sales_df["date"])
sales_df.head()

	date	product	unit price	units sold	sales price
0	2024-03-01	Button-Down Shirt	65.05	2	130.10
1	2024-03-01	Vintage Logo Tee	15.95	1	15.95
2	2024-03-01	Sticker Pack	4.50	1	4.50
3	2024-03-02	Super Soft Hoodie	75.00	2	150.00
4	2024-03-02	Button-Down Shirt	65.05	7	455.35

len(sales_df)

products = sales_df["product"].unique()
print(products)

['Button-Down Shirt' 'Vintage Logo Tee' 'Sticker Pack' 'Super Soft Hoodie'
 'Baseball Cap' 'Khaki Pants' 'Brown Boots']

days = sales_df["date"].unique()
print(len(days))
print(days.min().date(), "...", days.max().date())

31
2024-03-01 ... 2024-03-31

We know we can calculate total sales using a Series aggregation:

sales_df["sales price"].sum().round(2)

12000.71

But how can we calculate the total sales for each product?

Grouping

Enter the groupby method for group-based aggregations.

Calculating total sales per product, to identify the top selling products:

sales_by_product = sales_df.groupby("product")["sales price"].sum()
sales_by_product.sort_values(ascending=False)

product
Button-Down Shirt    6960.35
Super Soft Hoodie    1875.00
Khaki Pants          1602.00
Vintage Logo Tee      941.05
Brown Boots           250.00
Sticker Pack          216.00
Baseball Cap          156.31
Name: sales price, dtype: float64

Calculating total sales per day:

sales_by_date = sales_df.groupby("date")["sales price"].sum()
sales_by_date.sort_values(ascending=False).head()

date
2024-03-11    1067.48
2024-03-03     902.63
2024-03-10     765.20
2024-03-18     739.25
2024-03-02     675.53
Name: sales price, dtype: float64

Calculating total sales per day of week:

sales_df["weekday"] = sales_df["date"].dt.strftime("%A")

sales_by_weekday = sales_df.groupby("weekday")["sales price"].sum()
sales_by_weekday.sort_values(ascending=False)

weekday
Sunday       3428.91
Saturday     2312.84
Monday       2275.41
Thursday     1244.70
Wednesday     991.50
Tuesday       988.10
Friday        759.25
Name: sales price, dtype: float64

Pivot Tables

We can alternatively use the pivot_table function to perform more fine-grained grouping and aggregation.

Parameters of the pivot_table function:

The index parameter specifies the rows we want to wind up with (i.e. “row per what?”).
The values parameter specifies what columns we would like to aggregate.
The aggfunc parameter specifies how to aggregate those columns. We can pass our own aggregation function(s) or get them from the numpy package. We can aggregate different columns differently.

Pivoting by date:

from pandas import pivot_table
#import numpy as np

dates_pivot = pivot_table(sales_df,
    index=["date"],
    values=["sales price", "units sold"],
    aggfunc={
        "sales price": "sum", #np.sum,
        "units sold": "sum", #np.sum,
    } # designate the agg function to be used for each original column. can use our own custom functions here as well
)

dates_pivot.rename(columns={"sales price": "sales_total", "units sold": "units_sold"}, inplace=True)
dates_pivot.sort_values(by=["sales_total"], ascending=False, inplace=True)
dates_pivot.head()

	sales_total	units_sold
date
2024-03-11	1067.48	20
2024-03-03	902.63	15
2024-03-10	765.20	15
2024-03-18	739.25	18
2024-03-02	675.53	13

Pivoting by product:

products_pivot = pivot_table(sales_df,
    index=["product"],
    values=["sales price", "units sold"],
    aggfunc={
        "sales price": "sum", #np.sum,
        "units sold": "sum", #np.sum,
    }
)

products_pivot.rename(columns={"sales price": "sales_total", "units sold": "units_sold"}, inplace=True)
products_pivot.sort_values(by=["sales_total"], ascending=False, inplace=True)
products_pivot.head()

	sales_total	units_sold
product
Button-Down Shirt	6960.35	107
Super Soft Hoodie	1875.00	25
Khaki Pants	1602.00	18
Vintage Logo Tee	941.05	59
Brown Boots	250.00	2

Pivot by weekday:

weekdays_pivot = pivot_table(sales_df,
    index=["weekday"],
    values=["sales price", "units sold"],
    aggfunc={
        "sales price": ["sum", "mean"], #np.mean,
        #"units sold": ["sum", "mean"] #np.mean
    }
)

weekdays_pivot.columns = ["sales_avg", "sales_total"]
weekdays_pivot.sort_values(by=["sales_avg"], ascending=False, inplace=True)
weekdays_pivot

	sales_avg	sales_total
weekday
Sunday	155.859545	3428.91
Monday	126.411667	2275.41
Saturday	115.642000	2312.84
Thursday	77.793750	1244.70
Tuesday	76.007692	988.10
Wednesday	70.821429	991.50
Friday	54.232143	759.25

These pivot tables are now suitable for charting as well:

import plotly.express as px

chart_df = dates_pivot.copy()
#chart_df["date"] = chart_df.index
chart_df.sort_values(by=["sales_total"], ascending=True, inplace=True)

px.bar(chart_df,  #x="date",
       y="sales_total",
    title="Sales by Day (March 2024)", height=350
)

chart_df = products_pivot.copy()
chart_df["product"] = chart_df.index
chart_df.sort_values(by=["sales_total"], ascending=True, inplace=True)

px.bar(chart_df,  y="product", x="sales_total", orientation="h",
    title="Top Selling Products (March 2024)", height=350
)

chart_df = weekdays_pivot.copy()
chart_df["weekday"] = chart_df.index
chart_df.sort_values(by=["sales_avg"], ascending=True, inplace=True)

px.bar(chart_df,  y="weekday", x="sales_avg", orientation="h",
    title="Average Sales per Weekday (March 2024)", height=350
)