If the data we want to fetch is in XML format, including in an RSS feed, we can use the requests package to fetch it, and the beautifulsoup4 package to process it.
Let’s consider this example "students.xml" file we have hosted on the Internet:
First we note the URL of where the data resides. Then we pass that as a parameter to the get function from the requests package, to issue an HTTP GET request (as usual):
import requests# the URL of some XML data we stored online:request_url =f"https://raw.githubusercontent.com/prof-rossetti/intro-software-dev-python-book/main/docs/data/gradebook.xml"response = requests.get(request_url)print(type(response))
<class 'requests.models.Response'>
Then we pass the response text (an HTML or XML formatted string) to the BeautifulSoup class constructor.
from bs4 import BeautifulSoupsoup = BeautifulSoup(response.text)type(soup)
bs4.BeautifulSoup
28.1 Finding Elements
The resulting soup object is able to intelligently process the data. We can use the soup’s finder methods to search for specific data elements, called “tags”, based on their names or other attributes. If we want to return the first matching element, we use the find method, whereas if we want to get all matching elements, we use the find_all method.
For example, finding all the student tags in this structure:
students = soup.find_all("student")print(type(students))print(len(students))