In this chapter we will focus on row operations, including looping, sorting, and filtering rows.
For these examples, we will use our familiar dataset of products:
Code
from pandas import read_csv
request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Series
as a Row
Remember, when we access a given row of values, we are working with a Series
object:
first_row = df.iloc[0 ]
print (type (first_row))
print (first_row)
<class 'pandas.core.series.Series'>
id 1
name Chocolate Sandwich Cookies
aisle cookies cakes
department snacks
price 3.5
Name: 0, dtype: object
With a Series
object as a row, we can use dictionary-like accessors to access the values based on their column names:
print (first_row["name" ])
print (first_row["price" ])
Chocolate Sandwich Cookies
3.5
And we can convert the Series
object to a normal dictionary as desired:
{'id': 1,
'name': 'Chocolate Sandwich Cookies',
'aisle': 'cookies cakes',
'department': 'snacks',
'price': 3.5}
Row Iteration (Looping)
We can loop through each row using the iterrows
method , involving a destructuring approach to reference the row index (i
) as well as the row’s values (row
). With this approach, each row is a Series
object.
In this example, we are looping through just the first three rows, for simplicity:
#for i, row in df.iterrows():
for i, row in df.head(3 ).iterrows():
print ("------------" )
print ("INDEX:" , i)
print (type (row))
print (row)
------------
INDEX: 0
<class 'pandas.core.series.Series'>
id 1
name Chocolate Sandwich Cookies
aisle cookies cakes
department snacks
price 3.5
Name: 0, dtype: object
------------
INDEX: 1
<class 'pandas.core.series.Series'>
id 2
name All-Seasons Salt
aisle spices seasonings
department pantry
price 4.99
Name: 1, dtype: object
------------
INDEX: 2
<class 'pandas.core.series.Series'>
id 3
name Robust Golden Unsweetened Oolong Tea
aisle tea
department beverages
price 2.49
Name: 2, dtype: object
Sorting Rows
We can use the DataFrame
object’s sort_values
method to sort rows on the basis of one or more given columns.
By default, sort_values
is not mutating, unless we use the inplace
parameter:
df.sort_values(by= "name" , ascending= True , inplace= True )
df.head()
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
11
12
Chocolate Fudge Layer Cake
frozen dessert
frozen
18.50
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
7
8
Cut Russet Potatoes Steam N' Mash
frozen produce
frozen
4.25
5
6
Dry Nose Oil
cold flu allergy
personal care
21.99
We can use the ascending
parameter to specify the sort order:
df.sort_values(by= "name" , ascending= False , inplace= True )
df.head()
9
10
Sparkling Orange Juice & Prickly Pear Beverage
water seltzer sparkling water
beverages
2.99
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
12
13
Saline Nasal Mist
cold flu allergy
personal care
16.00
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
16
17
Rendered Duck Fat
poultry counter
meat seafood
9.99
Filtering Rows
We can filter a DataFrame
to get only the rows that match some given condition.
We first specify a “mask” condition that determines for each row, whether it meet the criteria or not (True
or False
). This first example mask is testing equality using the familiar ==
operator:
# this is the mask:
df["department" ] == "beverages"
0 False
1 False
2 True
3 False
4 False
5 False
6 True
7 False
8 False
9 True
10 True
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 True
Name: department, dtype: bool
Filtering based on equality, using the mask inside a square bracket reference:
df[ df["department" ] == "beverages" ]
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
6
7
Pure Coconut Water With Orange
juice nectars
beverages
3.50
9
10
Sparkling Orange Juice & Prickly Pear Beverage
water seltzer sparkling water
beverages
2.99
10
11
Peach Mango Juice
refrigerated
beverages
1.99
19
20
Pomegranate Cranberry & Aloe Vera Enrich Drink
juice nectars
beverages
4.25
This is an example of filtering based on equality, however there are other filter methods as well.
Filtering based on numeric comparisons, using comparison operators:
df[ df["price" ] >= 10.00 ]
5
6
Dry Nose Oil
cold flu allergy
personal care
21.99
11
12
Chocolate Fudge Layer Cake
frozen dessert
frozen
18.50
12
13
Saline Nasal Mist
cold flu allergy
personal care
16.00
14
15
Overnight Diapers Size 6
diapers wipes
babies
25.50
17
18
Pizza for One Suprema Frozen Pizza
frozen pizza
frozen
12.50
Filtering based on values between lower and upper bound, using between
method:
df[ df["price" ].between(10.00 , 20.00 ) ]
11
12
Chocolate Fudge Layer Cake
frozen dessert
frozen
18.5
12
13
Saline Nasal Mist
cold flu allergy
personal care
16.0
17
18
Pizza for One Suprema Frozen Pizza
frozen pizza
frozen
12.5
Filtering based on inclusion, using isin
method:
selected_departments = ["beverages" , "babies" ]
df[ df["department" ].isin(selected_departments) ]
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
6
7
Pure Coconut Water With Orange
juice nectars
beverages
3.50
9
10
Sparkling Orange Juice & Prickly Pear Beverage
water seltzer sparkling water
beverages
2.99
10
11
Peach Mango Juice
refrigerated
beverages
1.99
14
15
Overnight Diapers Size 6
diapers wipes
babies
25.50
19
20
Pomegranate Cranberry & Aloe Vera Enrich Drink
juice nectars
beverages
4.25
Filtering on substring match, using str.contains
method:
df[ df["name" ].str .contains("Cookie" ) ]
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.5