In this chapter we will focus on index operations, including changing and resetting the index.
5.1 Setting the Index
Every DataFrame has a default index values, but we can set our own.
We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.
For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:
from pandas import read_csvrequest_url ="https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"df = read_csv(request_url)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Setting the “id” column as the index:
# set the index and name it:df.index = df["id"]df.index.name ="id"# drop old column, as necessary and desired:df.drop(columns=["id"], inplace=True)df.head()
name
aisle
department
price
id
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
2
All-Seasons Salt
spices seasonings
pantry
4.99
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.2 Resetting the Index
If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:
df.reset_index(inplace=True)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.3 Resetting a Multi-Index
If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.
Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:
# moving multi-index to their own separate columns:df["symbol"] = df.index.get_level_values(0)df["date"] = df.index.get_level_values(1)# resetting the index:df.reset_index(drop=True, inplace=True)df.head()
open
high
low
volume
close
symbol
date
0
371.085077
373.109944
364.047705
25258600
368.117279
MSFT
2024-01-02
1
366.271049
370.489503
365.774760
23083500
367.849243
MSFT
2024-01-03
2
367.918763
370.330719
364.444741
20901500
365.209015
MSFT
2024-01-04
3
186.237603
187.521323
182.993502
82488700
184.734970
AAPL
2024-01-02
4
183.321893
184.973804
182.535736
58414500
183.351746
AAPL
2024-01-03
This gives us a flatter structure which may be easier to work with.