5  Row Index Values

In this chapter we will focus on index operations, including changing and resetting the index.

5.1 Setting the Index

Every DataFrame has a default index values, but we can set our own.

We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.

For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

Setting the “id” column as the index:

# set the index and name it:
df.index = df["id"]
df.index.name = "id"

# drop old column, as necessary and desired:
df.drop(columns=["id"], inplace=True)

df.head()
name aisle department price
id
1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
2 All-Seasons Salt spices seasonings pantry 4.99
3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.2 Resetting the Index

If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:

df.reset_index(inplace=True)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.3 Resetting a Multi-Index

If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.

Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:

from yahooquery import Ticker

symbols = ["MSFT", "AAPL", "GOOGL"]
ticker = Ticker(symbols)
df = ticker.history(start="2024-01-01", end="2024-01-05", adj_ohlc=True)
df
open high low volume close
symbol date
MSFT 2024-01-02 368.367727 370.377766 361.381887 25258600 365.421661
2024-01-03 363.588941 367.776505 363.096286 23083500 365.155579
2024-01-04 365.224542 367.618836 361.775960 20901500 362.534637
AAPL 2024-01-02 185.399081 186.677021 182.169586 82488700 183.903214
2024-01-03 182.496496 184.140970 181.713879 58414500 182.526215
2024-01-04 180.445890 181.377098 179.187783 71983600 180.208145
GOOGL 2024-01-02 137.511005 138.404250 135.456521 23711200 137.133850
2024-01-03 136.220745 138.582902 136.052022 24212100 137.878220
2024-01-04 137.381963 138.116419 135.327494 27137700 135.367188

We see the row index is a gnarly MultiIndex consisting of two values, representing both the “symbol” and the “date”:

df.index
MultiIndex([( 'MSFT', 2024-01-02),
            ( 'MSFT', 2024-01-03),
            ( 'MSFT', 2024-01-04),
            ( 'AAPL', 2024-01-02),
            ( 'AAPL', 2024-01-03),
            ( 'AAPL', 2024-01-04),
            ('GOOGL', 2024-01-02),
            ('GOOGL', 2024-01-03),
            ('GOOGL', 2024-01-04)],
           names=['symbol', 'date'])

Resetting the index:

# moving multi-index to their own separate columns:
df["symbol"] = df.index.get_level_values(0)
df["date"] = df.index.get_level_values(1)

# resetting the index:
df.reset_index(drop=True, inplace=True)
df.head()
open high low volume close symbol date
0 368.367727 370.377766 361.381887 25258600 365.421661 MSFT 2024-01-02
1 363.588941 367.776505 363.096286 23083500 365.155579 MSFT 2024-01-03
2 365.224542 367.618836 361.775960 20901500 362.534637 MSFT 2024-01-04
3 185.399081 186.677021 182.169586 82488700 183.903214 AAPL 2024-01-02
4 182.496496 184.140970 181.713879 58414500 182.526215 AAPL 2024-01-03

This gives us a flatter structure which may be easier to work with.