3  Column Operations

In this chapter we will focus on column operations, including dropping, renaming, and mapping columns.

For these examples, we will use a familiar dataset of products:

Code
from pandas import read_csv

request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/products.csv"
df = read_csv(request_url)
df.head()
id name aisle department price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

3.1 Series as a Column

Remember, when we access a given column of values, we are working with a Series object:

prices = df["price"]
print(type(prices))
prices
<class 'pandas.core.series.Series'>
0      3.50
1      4.99
2      2.49
3      6.99
4      7.99
5     21.99
6      3.50
7      4.25
8      6.50
9      2.99
10     1.99
11    18.50
12    16.00
13     4.99
14    25.50
15     4.50
16     9.99
17    12.50
18     3.99
19     4.25
Name: price, dtype: float64

With a Series object as a column, we can use list-like accessors to access one of the values based on its row index value:

prices[0]
np.float64(3.5)

And we can convert the Series object to a normal list as desired:

print(prices.tolist())
[3.5, 4.99, 2.49, 6.99, 7.99, 21.99, 3.5, 4.25, 6.5, 2.99, 1.99, 18.5, 16.0, 4.99, 25.5, 4.5, 9.99, 12.5, 3.99, 4.25]

3.1.1 Series Aggregations

One of the cool things about Series objects is they know how to calculate their own aggregations.

For example, minimum and maximum values:

print("MIN PRICE:", df["price"].min())
print("MAX PRICE:", df["price"].max())
MIN PRICE: 1.99
MAX PRICE: 25.5

Mean and median values:

print("MEAN PRICE:", df["price"].mean().round(2))
print("MEDIAN PRICE:", df["price"].median().round(2))
MEAN PRICE: 8.37
MEDIAN PRICE: 4.99

Percentiles, etc:

print("25TH PCT PRICE:", df["price"].quantile(.25).round(2))
print("75TH PCT PRICE:", df["price"].quantile(.75).round(2))
25TH PCT PRICE: 3.87
75TH PCT PRICE: 10.62

3.1.2 Value Counts

The value_counts method shows us a count of the number of rows for each value in a given column:

df["department"].value_counts()
department
beverages          5
frozen             4
pantry             2
snacks             2
personal care      2
dairy eggs         1
household          1
babies             1
meat seafood       1
dry goods pasta    1
Name: count, dtype: int64

We can use the normalize parameter to display our counts as percentage of the total number of rows:

df["department"].value_counts(normalize=True)
department
beverages          0.25
frozen             0.20
pantry             0.10
snacks             0.10
personal care      0.10
dairy eggs         0.05
household          0.05
babies             0.05
meat seafood       0.05
dry goods pasta    0.05
Name: proportion, dtype: float64

3.2 Renaming Columns

We can use the rename method to rename columns. This is not mutating unless we use the inplace parameter.

df.rename(columns={"department": "dept"}, inplace=True)
df.head()
id name aisle dept price
0 1 Chocolate Sandwich Cookies cookies cakes snacks 3.50
1 2 All-Seasons Salt spices seasonings pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen meals frozen 6.99
4 5 Green Chile Anytime Sauce marinades meat preparation pantry 7.99

3.3 Dropping Columns

We can use the drop method to remove columns from the dataset. This is not mutating unless we use the inplace parameter.

df.drop(columns=["aisle"], inplace=True)
df.head()
id name dept price
0 1 Chocolate Sandwich Cookies snacks 3.50
1 2 All-Seasons Salt pantry 4.99
2 3 Robust Golden Unsweetened Oolong Tea beverages 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen 6.99
4 5 Green Chile Anytime Sauce pantry 7.99

3.4 Creating New Columns

We can create a new column by specifying the name of that column and assigning a new value. For example, a column of constants:

df["inflation_factor"] = 1.5

df.head()
id name dept price inflation_factor
0 1 Chocolate Sandwich Cookies snacks 3.50 1.5
1 2 All-Seasons Salt pantry 4.99 1.5
2 3 Robust Golden Unsweetened Oolong Tea beverages 2.49 1.5
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen 6.99 1.5
4 5 Green Chile Anytime Sauce pantry 7.99 1.5

In practice, instead of creating a new column of constants or placeholder values, we will generally create a new column of transformed values (see “Mapping Columns” section below).

3.5 Mapping Columns

We can transform the values in one given column, and store the results in another column

When mapping using a single scalar value, for example multiplying all values in a column by 1.5:

# PRICE COL * 1.5
df["inflated_price"] = (df["price"] * 1.5).round(2)

df[["id", "name", "price", "inflated_price"]].head()
id name price inflated_price
0 1 Chocolate Sandwich Cookies 3.50 5.25
1 2 All-Seasons Salt 4.99 7.48
2 3 Robust Golden Unsweetened Oolong Tea 2.49 3.74
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... 6.99 10.48
4 5 Green Chile Anytime Sauce 7.99 11.98

When mapping using two columns, this performs an element-wise operation where the first values in each Series are compared, then the second values, etc.

# PRICE COL * INFLATION FACTOR COL
df["inflated_price"] = (df["price"] * df["inflation_factor"]).round(2)

df[["id", "name", "price", "inflation_factor", "inflated_price"]].head()
id name price inflation_factor inflated_price
0 1 Chocolate Sandwich Cookies 3.50 1.5 5.25
1 2 All-Seasons Salt 4.99 1.5 7.48
2 3 Robust Golden Unsweetened Oolong Tea 2.49 1.5 3.74
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... 6.99 1.5 10.48
4 5 Green Chile Anytime Sauce 7.99 1.5 11.98

This essentially allows us to compare multiple values from the same row.

3.5.1 Applying a Mapping Dictionary

If we want to change certain values, we can use the map method in conjunction with a dictionary to provide the mapping of old values to new values:

department_abbrevs = {
  "beverages": "BV",
  "frozen": "FR",
  "pantry": "PN",
  "snacks": "SN",
  "personal care": "PC",
  "dairy eggs": "DE",
  "household": "HH",
  "babies": "BB",
  "meat seafood": "MS",
  "dry goods pasta": "DG",
} # DEFINE VALUE MAPPING

df["dept_abbrev"] = df["dept"].map(department_abbrevs) # APPLY VALUE MAPPING

df[["id", "name", "dept", "dept_abbrev"]].head()
id name dept dept_abbrev
0 1 Chocolate Sandwich Cookies snacks SN
1 2 All-Seasons Salt pantry PN
2 3 Robust Golden Unsweetened Oolong Tea beverages BV
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... frozen FR
4 5 Green Chile Anytime Sauce pantry PN

In this example, we are using the department_abbrevs dictionary to look up the corresponding abbreviation for each department name in the “dept” column. As a result we obtain a new column of transformed values (the abbreviations), which we are storing back in the original DataFrame in a new column called “dept_abbrev”.

3.5.2 Applying a Transformation Function

For more complex mapping logic, we can use the apply method to apply a transformation function.

When defining our own transformation function, the function should accept a parameter representing one of the values in the column over which we are applying the function. It should operate on that input value and return the transformed value.

# TRANSFORMATION FUNCTION:

def inflate_price(original_price):
    # only inflate if the price is greater than $5
    if original_price < 5.00:
        return original_price
    else:
        return original_price * 1.5

assert inflate_price(3.00) == 3.00
assert inflate_price(10.00) == 15.00

After defining the transformation function, we pass that function as a parameter to the apply method:

# APPLY TRANSFORMATION FUNCTION TO PRICE COL:
df["inflated_price"] = df["price"].apply(inflate_price).round(2)

df[["id", "name", "price", "inflated_price"]].head()
id name price inflated_price
0 1 Chocolate Sandwich Cookies 3.50 3.50
1 2 All-Seasons Salt 4.99 4.99
2 3 Robust Golden Unsweetened Oolong Tea 2.49 2.49
3 4 Smart Ones Classic Favorites Mini Rigatoni Wit... 6.99 10.48
4 5 Green Chile Anytime Sauce 7.99 11.98

In this example, we are applying the transformation function inflate_price to each value in the “price” column. As a result we obtain a new column of transformed values, which we are storing back in the original DataFrame in a new column called “inflated_price”.