5  Row Index Values

In this chapter we will focus on index operations, including changing and resetting the index.

5.1 Setting the Index

Every DataFrame has a default index values, but we can set our own.

We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.

For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

Setting the “id” column as the index:

# set the index and name it:
df.index = df["id"]
df.index.name = "id"

# drop old column, as necessary and desired:
df.drop(columns=["id"], inplace=True)

df.head()
name aisle department price
id
1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
2 All-Seasons Salt spices seasonings pantry 4.99
3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.2 Resetting the Index

If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:

df.reset_index(inplace=True)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.3 Resetting a Multi-Index

If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.

Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:

from yahooquery import Ticker

symbols = ["MSFT", "AAPL", "GOOGL"]
ticker = Ticker(symbols)
df = ticker.history(start="2024-01-01", end="2024-01-05", adj_ohlc=True)
df
open high low volume close
symbol date
MSFT 2024-01-02 369.663798 371.680909 362.653379 25258600 366.707367
2024-01-03 364.868310 369.070608 364.373922 23083500 366.440460
2024-01-04 366.509673 368.912391 363.048956 20901500 363.810303
AAPL 2024-01-02 185.789422 187.070052 182.553128 82488700 184.290405
2024-01-03 182.880742 184.528677 182.096477 58414500 182.910522
2024-01-04 180.825785 181.758954 179.565029 71983600 180.587540
GOOGL 2024-01-02 137.724022 138.618651 135.666356 23711200 137.346283
2024-01-03 136.431770 138.797586 136.262785 24212100 138.091812
2024-01-04 137.594786 138.330379 135.537134 27137700 135.576889

We see the row index is a gnarly MultiIndex consisting of two values, representing both the “symbol” and the “date”:

df.index
MultiIndex([( 'MSFT', 2024-01-02),
            ( 'MSFT', 2024-01-03),
            ( 'MSFT', 2024-01-04),
            ( 'AAPL', 2024-01-02),
            ( 'AAPL', 2024-01-03),
            ( 'AAPL', 2024-01-04),
            ('GOOGL', 2024-01-02),
            ('GOOGL', 2024-01-03),
            ('GOOGL', 2024-01-04)],
           names=['symbol', 'date'])

Resetting the index:

# moving multi-index to their own separate columns:
df["symbol"] = df.index.get_level_values(0)
df["date"] = df.index.get_level_values(1)

# resetting the index:
df.reset_index(drop=True, inplace=True)
df.head()
open high low volume close symbol date
0 369.663798 371.680909 362.653379 25258600 366.707367 MSFT 2024-01-02
1 364.868310 369.070608 364.373922 23083500 366.440460 MSFT 2024-01-03
2 366.509673 368.912391 363.048956 20901500 363.810303 MSFT 2024-01-04
3 185.789422 187.070052 182.553128 82488700 184.290405 AAPL 2024-01-02
4 182.880742 184.528677 182.096477 58414500 182.910522 AAPL 2024-01-03

This gives us a flatter structure which may be easier to work with.