4 Row Operations

In this chapter we will focus on row operations, including looping, sorting, and filtering rows.

For these examples, we will use our familiar dataset of products:

Code

from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()

	id	name	aisle	department	price
0	1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.50
1	2	All-Seasons Salt	spices seasonings	pantry	4.99
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
3	4	Smart Ones Classic Favorites Mini Rigatoni Wit...	frozen meals	frozen	6.99
4	5	Green Chile Anytime Sauce	marinades meat preparation	pantry	7.99

4.1 `Series` as a Row

Remember, when we access a given row of values, we are working with a Series object:

first_row = df.iloc[0]
print(type(first_row))
print(first_row)

<class 'pandas.core.series.Series'>
id                                     1
name          Chocolate Sandwich Cookies
aisle                      cookies cakes
department                        snacks
price                                3.5
Name: 0, dtype: object

With a Series object as a row, we can use dictionary-like accessors to access the values based on their column names:

print(first_row["name"])
print(first_row["price"])

Chocolate Sandwich Cookies
3.5

And we can convert the Series object to a normal dictionary as desired:

first_row.to_dict()

{'id': 1,
 'name': 'Chocolate Sandwich Cookies',
 'aisle': 'cookies cakes',
 'department': 'snacks',
 'price': 3.5}

4.2 Row Iteration (Looping)

We can loop through each row using the iterrows method, involving a destructuring approach to reference the row index (i) as well as the row’s values (row). With this approach, each row is a Series object.

In this example, we are looping through just the first three rows, for simplicity:

#for i, row in df.iterrows():
for i, row in df.head(3).iterrows():
    print("------------")
    print("INDEX:", i)
    print(type(row))
    print(row)

------------
INDEX: 0
<class 'pandas.core.series.Series'>
id                                     1
name          Chocolate Sandwich Cookies
aisle                      cookies cakes
department                        snacks
price                                3.5
Name: 0, dtype: object
------------
INDEX: 1
<class 'pandas.core.series.Series'>
id                            2
name           All-Seasons Salt
aisle         spices seasonings
department               pantry
price                      4.99
Name: 1, dtype: object
------------
INDEX: 2
<class 'pandas.core.series.Series'>
id                                               3
name          Robust Golden Unsweetened Oolong Tea
aisle                                          tea
department                               beverages
price                                         2.49
Name: 2, dtype: object

4.3 Sorting Rows

We can use the DataFrame object’s sort_values method to sort rows on the basis of one or more given columns.

By default, sort_values is not mutating, unless we use the inplace parameter:

df.sort_values(by="name", ascending=True, inplace=True)
df.head()

	id	name	aisle	department	price
1	2	All-Seasons Salt	spices seasonings	pantry	4.99
11	12	Chocolate Fudge Layer Cake	frozen dessert	frozen	18.50
0	1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.50
7	8	Cut Russet Potatoes Steam N' Mash	frozen produce	frozen	4.25
5	6	Dry Nose Oil	cold flu allergy	personal care	21.99

We can use the ascending parameter to specify the sort order:

df.sort_values(by="name", ascending=False, inplace=True)
df.head()

	id	name	aisle	department	price
9	10	Sparkling Orange Juice & Prickly Pear Beverage	water seltzer sparkling water	beverages	2.99
3	4	Smart Ones Classic Favorites Mini Rigatoni Wit...	frozen meals	frozen	6.99
12	13	Saline Nasal Mist	cold flu allergy	personal care	16.00
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
16	17	Rendered Duck Fat	poultry counter	meat seafood	9.99

4.4 Filtering Rows

We can filter a DataFrame to get only the rows that match some given condition.

We first specify a “mask” condition that determines for each row, whether it meet the criteria or not (True or False). This first example mask is testing equality using the familiar == operator:

# this is the mask:
df["department"] == "beverages"

0     False
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10     True
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19     True
Name: department, dtype: bool

Filtering based on equality, using the mask inside a square bracket reference:

df[  df["department"] == "beverages" ]

	id	name	aisle	department	price
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
6	7	Pure Coconut Water With Orange	juice nectars	beverages	3.50
9	10	Sparkling Orange Juice & Prickly Pear Beverage	water seltzer sparkling water	beverages	2.99
10	11	Peach Mango Juice	refrigerated	beverages	1.99
19	20	Pomegranate Cranberry & Aloe Vera Enrich Drink	juice nectars	beverages	4.25

This is an example of filtering based on equality, however there are other filter methods as well.

Filtering based on numeric comparisons, using comparison operators:

df[  df["price"] >= 10.00 ]

	id	name	aisle	department	price
5	6	Dry Nose Oil	cold flu allergy	personal care	21.99
11	12	Chocolate Fudge Layer Cake	frozen dessert	frozen	18.50
12	13	Saline Nasal Mist	cold flu allergy	personal care	16.00
14	15	Overnight Diapers Size 6	diapers wipes	babies	25.50
17	18	Pizza for One Suprema Frozen Pizza	frozen pizza	frozen	12.50

Filtering based on values between lower and upper bound, using between method:

df[  df["price"].between(10.00, 20.00) ]

	id	name	aisle	department	price
11	12	Chocolate Fudge Layer Cake	frozen dessert	frozen	18.5
12	13	Saline Nasal Mist	cold flu allergy	personal care	16.0
17	18	Pizza for One Suprema Frozen Pizza	frozen pizza	frozen	12.5

Filtering based on inclusion, using isin method:

selected_departments = ["beverages", "babies"]

df[  df["department"].isin(selected_departments) ]

	id	name	aisle	department	price
2	3	Robust Golden Unsweetened Oolong Tea	tea	beverages	2.49
6	7	Pure Coconut Water With Orange	juice nectars	beverages	3.50
9	10	Sparkling Orange Juice & Prickly Pear Beverage	water seltzer sparkling water	beverages	2.99
10	11	Peach Mango Juice	refrigerated	beverages	1.99
14	15	Overnight Diapers Size 6	diapers wipes	babies	25.50
19	20	Pomegranate Cranberry & Aloe Vera Enrich Drink	juice nectars	beverages	4.25

Filtering on substring match, using str.contains method:

df[  df["name"].str.contains("Cookie") ]

	id	name	aisle	department	price
0	1	Chocolate Sandwich Cookies	cookies cakes	snacks	3.5

4.1 Series as a Row

4.2 Row Iteration (Looping)

4.3 Sorting Rows

4.4 Filtering Rows

4.1 `Series` as a Row