from google.colab import drive
'/content/drive') drive.mount(
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
One benefit of Google Colab notebooks, over generic Jupyter notebooks, is the ability to easily integrate with other Google products and services.
These integrations take advantage of the fact that in order to access a Colab notebook, you first need to be logged in with your Google Account. They then use credentials of the logged in user to access Google products and services on your behalf.
A Google Drive integration will allow us to read and write files from Google Drive.
We will first need to “mount” the Google Drive to the Colab filesystem, so we can access Drive files within Colab. When we mount the drive, we choose the name of a local subdirectory within the Colab filesystem (for example, “content/drive”) in which we would like to access the files:
from google.colab import drive
'/content/drive') drive.mount(
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Now any files in your Google Drive are accessable from the notebook (for example, accessing a file called “daily-prices-nflx.csv” stored in the top level of the user’s Google Drive:
import os
= "/content/drive/MyDrive/daily-prices-nflx.csv"
csv_filepath print(os.path.isfile(csv_filepath))
True
from pandas import read_csv
= read_csv(csv_filepath)
df df.head()
timestamp | open | high | low | close | adjusted_close | volume | dividend_amount | split_coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 2024-06-17 | 669.11 | 682.7099 | 665.1101 | 675.83 | 675.83 | 3631184 | 0.0 | 1.0 |
1 | 2024-06-14 | 655.05 | 675.5800 | 652.6400 | 669.38 | 669.38 | 4447116 | 0.0 | 1.0 |
2 | 2024-06-13 | 644.00 | 655.2500 | 642.3500 | 653.26 | 653.26 | 1863587 | 0.0 | 1.0 |
3 | 2024-06-12 | 652.21 | 655.7800 | 643.1100 | 650.06 | 650.06 | 2094381 | 0.0 | 1.0 |
4 | 2024-06-11 | 640.72 | 650.1900 | 640.5200 | 648.55 | 648.55 | 2167417 | 0.0 | 1.0 |
For a spreadsheet datastore integration, we can interface with Google Sheets directly.
We first need to create a Google Sheets document, for instance this example document.
Before proceeding, we observe the identifier of the Google Sheets document from the URL:
# document URL: https://docs.google.com/spreadsheets/d/1qbEagXHlGgFBiQWdUaL62V9f8MNW_t0QFmaQhoTiEjw/edit?usp=sharing
= "1qbEagXHlGgFBiQWdUaL62V9f8MNW_t0QFmaQhoTiEjw" DOCUMENT_ID
After noting the document identifier, we can use some boilerplate code to authenticate and access the document.
Interfacing with Google Sheets, using the gspread
package:
!pip list | grep gspread
gspread 6.0.2
gspread-dataframe 3.3.1
import gspread
= gspread.authorize(creds) client
Accessing the document:
= client.open_by_key(DOCUMENT_ID)
doc print("DOC:", type(doc), doc.title)
DOC: <class 'gspread.spreadsheet.Spreadsheet'> CSV Data Files (Python for Finance Book) - PUBLIC
Listing worksheets in the document:
doc.worksheets()
[<Worksheet 'daily-prices-nflx' id:0>,
<Worksheet 'daily-prices-nflx-copy' id:193822370>]
Accessing a specific sheet:
= doc.worksheet("daily-prices-nflx")
sheet print("SHEET:", sheet.title)
SHEET: daily-prices-nflx
Reading data from a sheet:
= sheet.get_all_records()
records print(sheet.title)
print("ROWS:", len(records))
0] records[
daily-prices-nflx
ROWS: 5555
{'timestamp': '2024-06-17',
'open': 669.11,
'high': 682.7099,
'low': 665.1101,
'close': 675.83,
'adjusted_close': 675.83,
'volume': 3631184,
'dividend_amount': 0,
'split_coefficient': 1}
Converting records to a dataframe, as desired:
from pandas import DataFrame
= DataFrame(records)
df df.head()
timestamp | open | high | low | close | adjusted_close | volume | dividend_amount | split_coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 2024-06-17 | 669.11 | 682.7099 | 665.1101 | 675.83 | 675.83 | 3631184 | 0 | 1.0 |
1 | 2024-06-14 | 655.05 | 675.5800 | 652.6400 | 669.38 | 669.38 | 4447116 | 0 | 1.0 |
2 | 2024-06-13 | 644.00 | 655.2500 | 642.3500 | 653.26 | 653.26 | 1863587 | 0 | 1.0 |
3 | 2024-06-12 | 652.21 | 655.7800 | 643.1100 | 650.06 | 650.06 | 2094381 | 0 | 1.0 |
4 | 2024-06-11 | 640.72 | 650.1900 | 640.5200 | 648.55 | 648.55 | 2167417 | 0 | 1.0 |
Here are some helper functions, to help you more easily interface with the gspread
package. Feel free to copy and adapt, as desired.
from gspread.exceptions import WorksheetNotFound
def find_or_create_sheet(sheet_name, doc=doc):
"""Access a sheet within the document, or create if not exists.
Params:
sheet_name (str): name of the sheet to access or create
doc (gspread.Spreadsheet): Google Sheets document object
Returns:
gspread.Worksheet: sheet object
"""
try:
= doc.worksheet(sheet_name)
sheet except WorksheetNotFound:
print("CREATING NEW SHEET...")
= doc.add_worksheet(title=sheet_name, rows="10", cols="10")
sheet
return sheet
= "daily-prices-nflx-copy"
sheet_name = find_or_create_sheet(sheet_name)
sheet print("SHEET:", type(sheet), sheet.title)
SHEET: <class 'gspread.worksheet.Worksheet'> daily-prices-nflx-copy
def write_data_to_sheet(df, sheet):
"""Write dataframe contents to a given sheet.
Params:
df (pandas.DataFrame): dataframe you would like to write to sheet
sheet (gspread.Worksheet): sheet object you would like to write to
"""
= df.columns.tolist()
header_row = df.values.tolist()
rows assert len(header_row) == len(rows[0]) # same number of columns in all rows
= [header_row] + rows
all_rows
sheet.clear()
sheet.update(all_rows)
write_data_to_sheet(df, sheet)
Verifying the data got written:
= sheet.get_all_records()
records print(sheet.title)
print(len(records))
0] records[
daily-prices-nflx-copy
5555
{'timestamp': '2024-06-17',
'open': 669.11,
'high': 682.7099,
'low': 665.1101,
'close': 675.83,
'adjusted_close': 675.83,
'volume': 3631184,
'dividend_amount': 0,
'split_coefficient': 1}