Appendix E — SEC Datasets

In this example we are accessing SEC data directly, instead of going through other packages.

The SEC website requires authorization, so we pass some headers like the ones below to obtain the data:

import requests

# h/t: https://stackoverflow.com/a/70386951/670433
# see: https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data

request_url = "https://www.sec.gov/files/company_tickers.json"

headers = {"User-Agent": "MyAppName (myemail@example.com)"}

response = requests.get(request_url, headers=headers)
data = response.json()
print(type(data))
<class 'dict'>

In this particular case of company tickers data, we get a big dictionary with numeric keys as strings.

list(data.keys())[0:5]
['0', '1', '2', '3', '4']
data["0"]
{'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'}

However we can clean up this structure:

records = []
for k, v in data.items():
    records.append(v)
records[0]
{'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'}

And compile our own dataframe:

from pandas import DataFrame

df = DataFrame(records)
df.head()
cik_str ticker title
0 320193 AAPL Apple Inc.
1 1045810 NVDA NVIDIA CORP
2 789019 MSFT MICROSOFT CORP
3 1652044 GOOGL Alphabet Inc.
4 1018724 AMZN AMAZON COM INC