4  Row Operations

In this chapter we will focus on row operations, including looping, sorting, and filtering rows.

For these examples, we will use our familiar dataset of products:

Code
from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

4.1 Series as a Row

Remember, when we access a given row of values, we are working with a Series object:

first_row = df.iloc[0]
print(type(first_row))
print(first_row)
<class 'pandas.core.series.Series'>
id                                     1
name          Chocolate Sandwich Cookies
aisle                      cookies cakes
department                        snacks
price                                3.5
Name: 0, dtype: object

With a Series object as a row, we can use dictionary-like accessors to access the values based on their column names:

print(first_row["name"])
print(first_row["price"])
Chocolate Sandwich Cookies
3.5

And we can convert the Series object to a normal dictionary as desired:

first_row.to_dict()
{'id': 1,
 'name': 'Chocolate Sandwich Cookies',
 'aisle': 'cookies cakes',
 'department': 'snacks',
 'price': 3.5}

4.2 Row Iteration (Looping)

We can loop through each row using the iterrows method, involving a destructuring approach to reference the row index (i) as well as the row’s values (row). With this approach, each row is a Series object.

In this example, we are looping through just the first three rows, for simplicity:

#for i, row in df.iterrows():
for i, row in df.head(3).iterrows():
    print("------------")
    print("INDEX:", i)
    print(type(row))
    print(row)
------------
INDEX: 0
<class 'pandas.core.series.Series'>
id                                     1
name          Chocolate Sandwich Cookies
aisle                      cookies cakes
department                        snacks
price                                3.5
Name: 0, dtype: object
------------
INDEX: 1
<class 'pandas.core.series.Series'>
id                            2
name           All-Seasons Salt
aisle         spices seasonings
department               pantry
price                      4.99
Name: 1, dtype: object
------------
INDEX: 2
<class 'pandas.core.series.Series'>
id                                               3
name          Robust Golden Unsweetened Oolong Tea
aisle                                          tea
department                               beverages
price                                         2.49
Name: 2, dtype: object

4.3 Sorting Rows

We can use the DataFrame object’s sort_values method to sort rows on the basis of one or more given columns.

By default, sort_values is not mutating, unless we use the inplace parameter:

df.sort_values(by="name", ascending=True, inplace=True)
df.head()
id name aisle department price
1 2 All-Seasons Salt spices seasonings pantry 4.99
11 12 Chocolate Fudge Layer Cake frozen dessert frozen 18.50
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
7 8 Cut Russet Potatoes Steam N' Mash frozen produce frozen 4.25
5 6 Dry Nose Oil cold flu allergy personal care 21.99

We can use the ascending parameter to specify the sort order:

df.sort_values(by="name", ascending=False, inplace=True)
df.head()
id name aisle department price
9 10 Sparkling Orange Juice & Prickly Pear Beverage water seltzer sparkling water beverages 2.99
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
12 13 Saline Nasal Mist cold flu allergy personal care 16.00
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
16 17 Rendered Duck Fat poultry counter meat seafood 9.99

4.4 Filtering Rows

We can filter a DataFrame to get only the rows that match some given condition.

We first specify a “mask” condition that determines for each row, whether it meet the criteria or not (True or False). This first example mask is testing equality using the familiar == operator:

# this is the mask:
df["department"] == "beverages"
0     False
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10     True
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19     True
Name: department, dtype: bool

Filtering based on equality, using the mask inside a square bracket reference:

df[  df["department"] == "beverages" ]
id name aisle department price
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
6 7 Pure Coconut Water With Orange juice nectars beverages 3.50
9 10 Sparkling Orange Juice & Prickly Pear Beverage water seltzer sparkling water beverages 2.99
10 11 Peach Mango Juice refrigerated beverages 1.99
19 20 Pomegranate Cranberry & Aloe Vera Enrich Drink juice nectars beverages 4.25

This is an example of filtering based on equality, however there are other filter methods as well.

Filtering based on numeric comparisons, using comparison operators:

df[  df["price"] >= 10.00 ]
id name aisle department price
5 6 Dry Nose Oil cold flu allergy personal care 21.99
11 12 Chocolate Fudge Layer Cake frozen dessert frozen 18.50
12 13 Saline Nasal Mist cold flu allergy personal care 16.00
14 15 Overnight Diapers Size 6 diapers wipes babies 25.50
17 18 Pizza for One Suprema Frozen Pizza frozen pizza frozen 12.50

Filtering based on values between lower and upper bound, using between method:

df[  df["price"].between(10.00, 20.00) ]
id name aisle department price
11 12 Chocolate Fudge Layer Cake frozen dessert frozen 18.5
12 13 Saline Nasal Mist cold flu allergy personal care 16.0
17 18 Pizza for One Suprema Frozen Pizza frozen pizza frozen 12.5

Filtering based on inclusion, using isin method:

selected_departments = ["beverages", "babies"]

df[  df["department"].isin(selected_departments) ]
id name aisle department price
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
6 7 Pure Coconut Water With Orange juice nectars beverages 3.50
9 10 Sparkling Orange Juice & Prickly Pear Beverage water seltzer sparkling water beverages 2.99
10 11 Peach Mango Juice refrigerated beverages 1.99
14 15 Overnight Diapers Size 6 diapers wipes babies 25.50
19 20 Pomegranate Cranberry & Aloe Vera Enrich Drink juice nectars beverages 4.25

Filtering on substring match, using str.contains method:

df[  df["name"].str.contains("Cookie") ]
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.5