In this chapter we will focus on index operations, including changing and resetting the index.
5.1 Setting the Index
Every DataFrame has a default index values, but we can set our own.
We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.
For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:
from pandas import read_csvrequest_url ="https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"df = read_csv(request_url)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Setting the “id” column as the index:
# set the index and name it:df.index = df["id"]df.index.name ="id"# drop old column, as necessary and desired:df.drop(columns=["id"], inplace=True)df.head()
name
aisle
department
price
id
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
2
All-Seasons Salt
spices seasonings
pantry
4.99
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.2 Resetting the Index
If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:
df.reset_index(inplace=True)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.3 Resetting a Multi-Index
If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.
Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:
# moving multi-index to their own separate columns:df["symbol"] = df.index.get_level_values(0)df["date"] = df.index.get_level_values(1)# resetting the index:df.reset_index(drop=True, inplace=True)df.head()
open
high
low
volume
close
symbol
date
0
369.663798
371.680909
362.653379
25258600
366.707367
MSFT
2024-01-02
1
364.868310
369.070608
364.373922
23083500
366.440460
MSFT
2024-01-03
2
366.509673
368.912391
363.048956
20901500
363.810303
MSFT
2024-01-04
3
185.789422
187.070052
182.553128
82488700
184.290405
AAPL
2024-01-02
4
182.880742
184.528677
182.096477
58414500
182.910522
AAPL
2024-01-03
This gives us a flatter structure which may be easier to work with.