5  Row Index Values

In this chapter we will focus on index operations, including changing and resetting the index.

5.1 Setting the Index

Every DataFrame has a default index values, but we can set our own.

We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.

For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

Setting the “id” column as the index:

# set the index and name it:
df.index = df["id"]
df.index.name = "id"

# drop old column, as necessary and desired:
df.drop(columns=["id"], inplace=True)

df.head()
name aisle department price
id
1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
2 All-Seasons Salt spices seasonings pantry 4.99
3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.2 Resetting the Index

If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:

df.reset_index(inplace=True)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

5.3 Resetting a Multi-Index

If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.

Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:

from yahooquery import Ticker

symbols = ["MSFT", "AAPL", "GOOGL"]
ticker = Ticker(symbols)
df = ticker.history(start="2024-01-01", end="2024-01-05", adj_ohlc=True)
df
open high low volume close
symbol date
MSFT 2024-01-02 371.827833 373.856753 364.776375 25258600 368.854095
2024-01-03 367.004218 371.231116 366.506935 23083500 368.585571
2024-01-04 368.655174 371.071958 365.174199 20901500 365.940002
AAPL 2024-01-02 186.237603 187.521323 182.993502 82488700 184.734970
2024-01-03 183.321893 184.973804 182.535736 58414500 183.351746
2024-01-04 181.261998 182.197418 179.998201 71983600 181.023178
GOOGL 2024-01-02 138.049561 138.946304 135.987031 23711200 137.670929
2024-01-03 136.754262 139.125670 136.584878 24212100 138.418228
2024-01-04 137.920020 138.657353 135.857505 27137700 135.897354

We see the row index is a gnarly MultiIndex consisting of two values, representing both the “symbol” and the “date”:

df.index
MultiIndex([( 'MSFT', 2024-01-02),
            ( 'MSFT', 2024-01-03),
            ( 'MSFT', 2024-01-04),
            ( 'AAPL', 2024-01-02),
            ( 'AAPL', 2024-01-03),
            ( 'AAPL', 2024-01-04),
            ('GOOGL', 2024-01-02),
            ('GOOGL', 2024-01-03),
            ('GOOGL', 2024-01-04)],
           names=['symbol', 'date'])

Resetting the index:

# moving multi-index to their own separate columns:
df["symbol"] = df.index.get_level_values(0)
df["date"] = df.index.get_level_values(1)

# resetting the index:
df.reset_index(drop=True, inplace=True)
df.head()
open high low volume close symbol date
0 371.827833 373.856753 364.776375 25258600 368.854095 MSFT 2024-01-02
1 367.004218 371.231116 366.506935 23083500 368.585571 MSFT 2024-01-03
2 368.655174 371.071958 365.174199 20901500 365.940002 MSFT 2024-01-04
3 186.237603 187.521323 182.993502 82488700 184.734970 AAPL 2024-01-02
4 183.321893 184.973804 182.535736 58414500 183.351746 AAPL 2024-01-03

This gives us a flatter structure which may be easier to work with.