In this example we are accessing SEC data directly, instead of going through other packages.
The SEC website requires authorization, so we pass some headers like the ones below to obtain the data:
import requests
# h/t: https://stackoverflow.com/a/70386951/670433
# see: https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data
request_url = "https://www.sec.gov/files/company_tickers.json"
headers = {"User-Agent": "MyAppName (myemail@example.com)"}
response = requests.get(request_url, headers=headers)
data = response.json()
print(type(data))
In this particular case of company tickers data, we get a big dictionary with numeric keys as strings.
['0', '1', '2', '3', '4']
{'cik_str': 1045810, 'ticker': 'NVDA', 'title': 'NVIDIA CORP'}
However we can clean up this structure:
records = []
for k, v in data.items():
records.append(v)
records[0]
{'cik_str': 1045810, 'ticker': 'NVDA', 'title': 'NVIDIA CORP'}
And compile our own dataframe:
from pandas import DataFrame
df = DataFrame(records)
df.head()
0 |
1045810 |
NVDA |
NVIDIA CORP |
1 |
789019 |
MSFT |
MICROSOFT CORP |
2 |
320193 |
AAPL |
Apple Inc. |
3 |
1652044 |
GOOGL |
Alphabet Inc. |
4 |
1018724 |
AMZN |
AMAZON COM INC |