import requests
# the URL of some XML data we stored online:
= f"https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"
request_url
= requests.get(request_url)
response print(type(response))
28 Fetching XML Data
If the data we want to fetch is in XML format, including in an RSS feed, we can use the requests
package to fetch it, and the beautifulsoup4
package to process it.
Let’s consider this example "students.xml" file we have hosted on the Internet:
GradeReport>
<DownloadDate>2018-06-05</DownloadDate>
<ProfessorId>123</ProfessorId>
<Students>
<Student>
<StudentId>1</StudentId>
<FinalGrade>76.7</FinalGrade>
<Student>
</Student>
<StudentId>2</StudentId>
<FinalGrade>85.1</FinalGrade>
<Student>
</Student>
<StudentId>3</StudentId>
<FinalGrade>50.3</FinalGrade>
<Student>
</Student>
<StudentId>4</StudentId>
<FinalGrade>89.8</FinalGrade>
<Student>
</Student>
<StudentId>5</StudentId>
<FinalGrade>97.4</FinalGrade>
<Student>
</Student>
<StudentId>6</StudentId>
<FinalGrade>75.5</FinalGrade>
<Student>
</Student>
<StudentId>7</StudentId>
<FinalGrade>87.2</FinalGrade>
<Student>
</Student>
<StudentId>8</StudentId>
<FinalGrade>88.0</FinalGrade>
<Student>
</Student>
<StudentId>9</StudentId>
<FinalGrade>93.9</FinalGrade>
<Student>
</Student>
<StudentId>10</StudentId>
<FinalGrade>92.5</FinalGrade>
<Student>
</Students>
</GradeReport> </
First we note the URL of where the data resides. Then we pass that as a parameter to the get
function from the requests
package, to issue an HTTP GET request (as usual):
Then we pass the response text (an HTML or XML formatted string) to the BeautifulSoup
class constructor.
from bs4 import BeautifulSoup
= BeautifulSoup(response.text)
soup type(soup)
28.1 Finding Elements
The resulting soup object is able to intelligently process the data. We can use the soup’s finder methods to search for specific data elements, called “tags”, based on their names or other attributes. If we want to return the first matching element, we use the find
method, whereas if we want to get all matching elements, we use the find_all
method.
For example, finding all the student tags in this structure:
= soup.find_all("student")
students print(type(students))
print(len(students))
Examining the first item for reference:
print(type(students[0]))
0] students[
Looping through all the items:
for student in students:
print("-----------")
print(type(student))
= student.studentid.text
student_id = student.finalgrade.text
final_grade print(student_id, final_grade)
Calculating the average grade:
from statistics import mean, median
= [float(student.finalgrade.text) for student in students]
grades
print(mean(grades))
print(median(grades))