In this chapter we will focus on index operations, including changing and resetting the index.
5.1 Setting the Index
Every DataFrame has a default index values, but we can set our own.
We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.
For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:
from pandas import read_csvrequest_url ="https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"df = read_csv(request_url)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Setting the “id” column as the index:
# set the index and name it:df.index = df["id"]df.index.name ="id"# drop old column, as necessary and desired:df.drop(columns=["id"], inplace=True)df.head()
name
aisle
department
price
id
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
2
All-Seasons Salt
spices seasonings
pantry
4.99
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.2 Resetting the Index
If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:
df.reset_index(inplace=True)df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
5.3 Resetting a Multi-Index
If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.
Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:
# moving multi-index to their own separate columns:df["symbol"] = df.index.get_level_values(0)df["date"] = df.index.get_level_values(1)# resetting the index:df.reset_index(drop=True, inplace=True)df.head()
open
high
low
volume
close
symbol
date
0
369.057108
371.070910
362.058195
25258600
366.105530
MSFT
2024-01-02
1
364.269480
368.464882
363.775903
23083500
365.839050
MSFT
2024-01-03
2
365.908135
368.306911
362.453099
20901500
363.213196
MSFT
2024-01-04
3
185.578846
186.858024
182.346219
82488700
184.081528
AAPL
2024-01-02
4
182.673409
184.319476
181.890032
58414500
182.703156
AAPL
2024-01-03
This gives us a flatter structure which may be easier to work with.