Appendix E — SEC Datasets

In this example we are accessing SEC data directly, instead of going through other packages.

The SEC website requires authorization, so we pass some headers like the ones below to obtain the data:

import requests

# h/t: https://stackoverflow.com/a/70386951/670433
# see: https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data

request_url = "https://www.sec.gov/files/company_tickers.json"

headers = {"User-Agent": "MyAppName (myemail@example.com)"}

response = requests.get(request_url, headers=headers)
data = response.json()
print(type(data))

<class 'dict'>

In this particular case of company tickers data, we get a big dictionary with numeric keys as strings.

list(data.keys())[0:5]

['0', '1', '2', '3', '4']

data["0"]

{'cik_str': 789019, 'ticker': 'MSFT', 'title': 'MICROSOFT CORP'}

However we can clean up this structure:

records = []
for k, v in data.items():
    records.append(v)
records[0]

{'cik_str': 789019, 'ticker': 'MSFT', 'title': 'MICROSOFT CORP'}

And compile our own dataframe:

from pandas import DataFrame

df = DataFrame(records)
df.head()

	cik_str	ticker	title
0	789019	MSFT	MICROSOFT CORP
1	1045810	NVDA	NVIDIA CORP
2	320193	AAPL	Apple Inc.
3	1018724	AMZN	AMAZON COM INC
4	1652044	GOOGL	Alphabet Inc.