from pandas import read_csv
# the URL of some CSV data we stored online:
= "https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.csv"
request_url
= read_csv(request_url)
df print(type(df))
df
27 Fetching CSV Data
If the data we want to fetch is in CSV format, we can use the pandas
package to fetch and process it.
Let’s consider this example "students.csv" file we have hosted on the Internet:
student_id,final_grade
1,76.7
2,85.1
3,50.3
4,89.8
5,97.4
6,75.5
7,87.2
8,88.0
9,93.9
10,92.5
First we note the URL of where the data resides. Then we pass that as a parameter to the read_csv
function from the pandas
package, to issue an HTTP GET request:
The resulting data is a spreadsheet-like object, with rows and columns, called the pandas.DataFrame
datatype.
To work with the column of grades, we can access them by specifying the name of the column, which in this case is "final_grade"
:
= df["final_grade"]
grades_column print(type(grades_column))
grades_column
The resulting column of grades is a list-like object called the pandas.Series
datatype.
Calculating the average grade (using series aggregation methods):
print(grades_column.mean())
print(grades_column.median())
The pandas
package is a foundational component of the Python ecosystem, and provides many additional capabilities for processing tabular data. Although outside the scope of this book, working with tabular data is covered in more detail in the professor’s Applied Data Science in Python book.