27 Fetching CSV Data

If the data we want to fetch is in CSV format, we can use the pandas package to fetch and process it.

Let’s consider this example "students.csv" file we have hosted on the Internet:

student_id,final_grade
1,76.7
2,85.1
3,50.3
4,89.8
5,97.4
6,75.5
7,87.2
8,88.0
9,93.9
10,92.5

First we note the URL of where the data resides. Then we pass that as a parameter to the read_csv function from the pandas package, to issue an HTTP GET request:

from pandas import read_csv

# the URL of some CSV data we stored online:
request_url = "https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.csv"

df = read_csv(request_url)
print(type(df))
df

<class 'pandas.core.frame.DataFrame'>

	student_id	final_grade
0	1	76.7
1	2	85.1
2	3	50.3
3	4	89.8
4	5	97.4
5	6	75.5
6	7	87.2
7	8	88.0
8	9	93.9
9	10	92.5

The resulting data is a spreadsheet-like object, with rows and columns, called the pandas.DataFrame datatype.

To work with the column of grades, we can access them by specifying the name of the column, which in this case is "final_grade":

grades_column = df["final_grade"]
print(type(grades_column))
grades_column

<class 'pandas.core.series.Series'>

0    76.7
1    85.1
2    50.3
3    89.8
4    97.4
5    75.5
6    87.2
7    88.0
8    93.9
9    92.5
Name: final_grade, dtype: float64

The resulting column of grades is a list-like object called the pandas.Series datatype.

Calculating the average grade (using series aggregation methods):

print(grades_column.mean())
print(grades_column.median())

83.64
87.6

The pandas package is a foundational component of the Python ecosystem, and provides many additional capabilities for processing tabular data. Although outside the scope of this book, working with tabular data is covered in more detail in the professor’s Applied Data Science in Python book.