5 Row Index Values

In this chapter we will focus on index operations, including changing and resetting the index.

5.1 Setting the Index

Every DataFrame has a default index values, but we can set our own.

We might do this in practice when one of the columns in the dataset contains values which uniquely identify each row in the dataset.

For example, this products dataset has an “id” column that uniquely represents each row, so we could set that column as the index:

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()

	id	name	aisle	department	price
0	1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.50
1	2	All-Seasons Salt	spices seasonings	pantry	4.99
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
3	4	Smart Ones Classic Favorites Mini Rigatoni Wit...	frozen meals	frozen	6.99
4	5	Green Chile Anytime Sauce	marinades meat preparation	pantry	7.99

Setting the “id” column as the index:

# set the index and name it:
df.index = df["id"]
df.index.name = "id"

# drop old column, as necessary and desired:
df.drop(columns=["id"], inplace=True)

df.head()

	name	aisle	department	price
id
1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.50
2	All-Seasons Salt	spices seasonings	pantry	4.99
3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
4	Smart Ones Classic Favorites Mini Rigatoni Wit...	frozen meals	frozen	6.99
5	Green Chile Anytime Sauce	marinades meat preparation	pantry	7.99

5.2 Resetting the Index

If you want to revert a DataFrame to use the default index values (auto-incrementing integers), we can use the reset method:

df.reset_index(inplace=True)
df.head()

	id	name	aisle	department	price
0	1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.50
1	2	All-Seasons Salt	spices seasonings	pantry	4.99
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
3	4	Smart Ones Classic Favorites Mini Rigatoni Wit...	frozen meals	frozen	6.99
4	5	Green Chile Anytime Sauce	marinades meat preparation	pantry	7.99

5.3 Resetting a Multi-Index

If our dataset has a multi-index (like a composite primary key comprised of multiple columns), we can convert the index components to their own columns, and reset the index to be simpler.

Consider this dataset of time-series stock prices from the yahooquery package, where we have a row per stock per date:

from yahooquery import Ticker

symbols = ["MSFT", "AAPL", "GOOGL"]
ticker = Ticker(symbols)
df = ticker.history(start="2024-01-01", end="2024-01-05", adj_ohlc=True)
df

		open	high	low	volume	close
symbol	date
MSFT	2024-01-02	369.663798	371.680909	362.653379	25258600	366.707367
	2024-01-03	364.868310	369.070608	364.373922	23083500	366.440460
	2024-01-04	366.509673	368.912391	363.048956	20901500	363.810303
AAPL	2024-01-02	185.789422	187.070052	182.553128	82488700	184.290405
	2024-01-03	182.880742	184.528677	182.096477	58414500	182.910522
	2024-01-04	180.825785	181.758954	179.565029	71983600	180.587540
GOOGL	2024-01-02	137.724022	138.618651	135.666356	23711200	137.346283
	2024-01-03	136.431770	138.797586	136.262785	24212100	138.091812
	2024-01-04	137.594786	138.330379	135.537134	27137700	135.576889

We see the row index is a gnarly MultiIndex consisting of two values, representing both the “symbol” and the “date”:

df.index

MultiIndex([( 'MSFT', 2024-01-02),
            ( 'MSFT', 2024-01-03),
            ( 'MSFT', 2024-01-04),
            ( 'AAPL', 2024-01-02),
            ( 'AAPL', 2024-01-03),
            ( 'AAPL', 2024-01-04),
            ('GOOGL', 2024-01-02),
            ('GOOGL', 2024-01-03),
            ('GOOGL', 2024-01-04)],
           names=['symbol', 'date'])

Resetting the index:

# moving multi-index to their own separate columns:
df["symbol"] = df.index.get_level_values(0)
df["date"] = df.index.get_level_values(1)

# resetting the index:
df.reset_index(drop=True, inplace=True)
df.head()

	open	high	low	volume	close	symbol	date
0	369.663798	371.680909	362.653379	25258600	366.707367	MSFT	2024-01-02
1	364.868310	369.070608	364.373922	23083500	366.440460	MSFT	2024-01-03
2	366.509673	368.912391	363.048956	20901500	363.810303	MSFT	2024-01-04
3	185.789422	187.070052	182.553128	82488700	184.290405	AAPL	2024-01-02
4	182.880742	184.528677	182.096477	58414500	182.910522	AAPL	2024-01-03

This gives us a flatter structure which may be easier to work with.