We can preview the first few rows or last few rows using the head or tail methods, respectively.
Previewing the first few rows:
df.head()
id
name
aisle
department
price
0
1
Chocolate Sandwich Cookies
cookies cakes
snacks
3.50
1
2
All-Seasons Salt
spices seasonings
pantry
4.99
2
3
Robust Golden Unsweetened Oolong Tea
tea
beverages
2.49
3
4
Smart Ones Classic Favorites Mini Rigatoni Wit...
frozen meals
frozen
6.99
4
5
Green Chile Anytime Sauce
marinades meat preparation
pantry
7.99
Previewing the last few rows:
df.tail()
id
name
aisle
department
price
15
16
Mint Chocolate Flavored Syrup
ice cream toppings
snacks
4.50
16
17
Rendered Duck Fat
poultry counter
meat seafood
9.99
17
18
Pizza for One Suprema Frozen Pizza
frozen pizza
frozen
12.50
18
19
Gluten Free Quinoa Three Cheese & Mushroom Blend
grains rice dried goods
dry goods pasta
3.99
19
20
Pomegranate Cranberry & Aloe Vera Enrich Drink
juice nectars
beverages
4.25
By default, we see five rows, but we can customize the number of rows by passing an integer parameter to these methods, like head(3) or tail(3).
2.2 Dataset Properties
2.2.1 Size and Shape
It’s easy to count the number of rows, using the familiar len function:
len(df)
20
Alternatively, we can access the shape property, which tells us the dataset size in terms of number of rows and columns:
df.shape
(20, 5)
Note
The shape is a tuple formatted as (n_rows, n_cols), where the first value represents the number of rows, and the second represents the number of columns.
2.2.2 Column Names
Every DataFrame object has a set of column names, which uniquely identify the columns in the dataset.
Accessing the column names, using the columns property:
As we see, the default row index is a set of auto-incrementing numbers starting at 0 (similar to the index values of a simple list).
However, it is possible to update the index values. For more information about working with the index, see Index Operations.
2.3 Accessing Data
2.3.1 Accessing Columns
We can access one or more columns worth of values, using a dictionary-like accessor.
To access a single column, we pass the string column name, and we get a pandas Series object back:
names = df["name"] # SINGLE COLUMN NAMEprint(type(names))names.head()
<class 'pandas.core.series.Series'>
0 Chocolate Sandwich Cookies
1 All-Seasons Salt
2 Robust Golden Unsweetened Oolong Tea
3 Smart Ones Classic Favorites Mini Rigatoni Wit...
4 Green Chile Anytime Sauce
Name: name, dtype: object
To access multiple columns, we pass a list of string column names, and we get a DataFrame object back:
names_and_prices = df[["name", "price"]] # LIST OF COLUMN NAMESprint(type(names_and_prices))names_and_prices.head()
<class 'pandas.core.frame.DataFrame'>
name
price
0
Chocolate Sandwich Cookies
3.50
1
All-Seasons Salt
4.99
2
Robust Golden Unsweetened Oolong Tea
2.49
3
Smart Ones Classic Favorites Mini Rigatoni Wit...
6.99
4
Green Chile Anytime Sauce
7.99
For more information about working with columns, see Column Operations.
2.3.2 Accessing Rows
To access a given row, we can use the iloc method in conjunction with a list-like accessor, referencing the index value of that row:
first_row = df.iloc[0] # ACCESSING A ROW BY ITS INDEX VALUEprint(type(first_row))first_row
<class 'pandas.core.series.Series'>
id 1
name Chocolate Sandwich Cookies
aisle cookies cakes
department snacks
price 3.5
Name: 0, dtype: object
When we access a single row, we get a Series object back.
Note
When we use index references like this with the iloc method, we are actually referencing the index value itself, not the position of the row in the dataset. In the event the index values change, you would need to use the new values instead of the default integer values.
We can access multiple rows in succession (for example the first three rows), using a list-slicing approach: