5  Row Index Values

In this chapter we will focus on index operations, including changing and resetting the index.

5.1 Setting the Index

Every DataFrame has a default index values, but we can set our own.

We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.

For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

Setting the “id” column as the index:

# set the index and name it:
df.index = df["id"]
df.index.name = "id"

# drop old column, as necessary and desired:
df.drop(columns=["id"], inplace=True)

df.head()
name aisle department price
id
1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
2 All-Seasons Salt spices seasonings pantry 4.99
3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.2 Resetting the Index

If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:

df.reset_index(inplace=True)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.3 Resetting a Multi-Index

If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.

Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:

from yahooquery import Ticker

symbols = ["MSFT", "AAPL", "GOOGL"]
ticker = Ticker(symbols)
df = ticker.history(start="2024-01-01", end="2024-01-05", adj_ohlc=True)
df
open high low volume close
symbol date
MSFT 2024-01-02 369.057108 371.070910 362.058195 25258600 366.105530
2024-01-03 364.269480 368.464882 363.775903 23083500 365.839050
2024-01-04 365.908135 368.306911 362.453099 20901500 363.213196
AAPL 2024-01-02 185.578846 186.858024 182.346219 82488700 184.081528
2024-01-03 182.673409 184.319476 181.890032 58414500 182.703156
2024-01-04 180.620803 181.552914 179.361476 71983600 180.382828
GOOGL 2024-01-02 137.600958 138.494788 135.545130 23711200 137.223557
2024-01-03 136.309870 138.673573 136.141037 24212100 137.968430
2024-01-04 137.471843 138.206780 135.416030 27137700 135.455750

We see the row index is a gnarly MultiIndex consisting of two values, representing both the “symbol” and the “date”:

df.index
MultiIndex([( 'MSFT', 2024-01-02),
            ( 'MSFT', 2024-01-03),
            ( 'MSFT', 2024-01-04),
            ( 'AAPL', 2024-01-02),
            ( 'AAPL', 2024-01-03),
            ( 'AAPL', 2024-01-04),
            ('GOOGL', 2024-01-02),
            ('GOOGL', 2024-01-03),
            ('GOOGL', 2024-01-04)],
           names=['symbol', 'date'])

Resetting the index:

# moving multi-index to their own separate columns:
df["symbol"] = df.index.get_level_values(0)
df["date"] = df.index.get_level_values(1)

# resetting the index:
df.reset_index(drop=True, inplace=True)
df.head()
open high low volume close symbol date
0 369.057108 371.070910 362.058195 25258600 366.105530 MSFT 2024-01-02
1 364.269480 368.464882 363.775903 23083500 365.839050 MSFT 2024-01-03
2 365.908135 368.306911 362.453099 20901500 363.213196 MSFT 2024-01-04
3 185.578846 186.858024 182.346219 82488700 184.081528 AAPL 2024-01-02
4 182.673409 184.319476 181.890032 58414500 182.703156 AAPL 2024-01-03

This gives us a flatter structure which may be easier to work with.