In this chapter we will focus on index operations, including changing and resetting the index.
5.1 Setting the Index
Every DataFrame has a default index values, but we can set our own.
We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.
For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:
from pandas import read_csvrequest_url ="https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"df = read_csv(request_url)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Setting the “id” column as the index:
# set the index and name it:df.index = df["id"]df.index.name ="id"# drop old column, as necessary and desired:df.drop(columns=["id"], inplace=True)df.head()
name
aisle
department
price
id
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
2
All-Seasons Salt
spices seasonings
pantry
4.99
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.2 Resetting the Index
If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:
df.reset_index(inplace=True)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.3 Resetting a Multi-Index
If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.
Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:
# moving multi-index to their own separate columns:df["symbol"] = df.index.get_level_values(0)df["date"] = df.index.get_level_values(1)# resetting the index:df.reset_index(drop=True, inplace=True)df.head()
open
high
low
volume
close
symbol
date
0
371.827833
373.856753
364.776375
25258600
368.854095
MSFT
2024-01-02
1
367.004218
371.231116
366.506935
23083500
368.585571
MSFT
2024-01-03
2
368.655174
371.071958
365.174199
20901500
365.940002
MSFT
2024-01-04
3
186.237603
187.521323
182.993502
82488700
184.734970
AAPL
2024-01-02
4
183.321893
184.973804
182.535736
58414500
183.351746
AAPL
2024-01-03
This gives us a flatter structure which may be easier to work with.