from google.colab import drive
'/content/drive') drive.mount(
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
One benefit of Google Colab notebooks, over generic Jupyter notebooks, is the ability to easily integrate with other Google products and services.
These integrations take advantage of the fact that in order to access a Colab notebook, you first need to be logged in with your Google Account. They then use credentials of the logged in user to access Google products and services on your behalf.
A Google Drive integration will allow us to read and write files from Google Drive.
We will first need to “mount” the Google Drive to the Colab filesystem, so we can access Drive files within Colab. When we mount the drive, we choose the name of a local subdirectory within the Colab filesystem (for example, “content/drive”) in which we would like to access the files:
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Now any files in your Google Drive are accessable from the notebook (for example, accessing a file called “daily-prices-nflx.csv” stored in the top level of the user’s Google Drive:
import os
csv_filepath = "/content/drive/MyDrive/daily-prices-nflx.csv"
print(os.path.isfile(csv_filepath))
True
timestamp | open | high | low | close | adjusted_close | volume | dividend_amount | split_coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 2024-06-17 | 669.11 | 682.7099 | 665.1101 | 675.83 | 675.83 | 3631184 | 0.0 | 1.0 |
1 | 2024-06-14 | 655.05 | 675.5800 | 652.6400 | 669.38 | 669.38 | 4447116 | 0.0 | 1.0 |
2 | 2024-06-13 | 644.00 | 655.2500 | 642.3500 | 653.26 | 653.26 | 1863587 | 0.0 | 1.0 |
3 | 2024-06-12 | 652.21 | 655.7800 | 643.1100 | 650.06 | 650.06 | 2094381 | 0.0 | 1.0 |
4 | 2024-06-11 | 640.72 | 650.1900 | 640.5200 | 648.55 | 648.55 | 2167417 | 0.0 | 1.0 |
For a spreadsheet datastore integration, we can interface with Google Sheets directly.
We first need to create a Google Sheets document, for instance this example document.
Before proceeding, we observe the identifier of the Google Sheets document from the URL:
After noting the document identifier, we can use some boilerplate code to authenticate and access the document.
Authorizing Google Colab to authenticate on behalf of the currently logged in user (assuming that user has access to the spreadsheet document you created):
Using credentials from the logged in user:
Interfacing with Google Sheets, using the gspread
package:
Accessing the document:
DOC: <class 'gspread.spreadsheet.Spreadsheet'> CSV Data Files (Python for Finance Book) - PUBLIC
Listing worksheets in the document:
[<Worksheet 'daily-prices-nflx' id:0>,
<Worksheet 'daily-prices-nflx-copy' id:193822370>]
Accessing a specific sheet:
Reading data from a sheet:
daily-prices-nflx
ROWS: 5555
{'timestamp': '2024-06-17',
'open': 669.11,
'high': 682.7099,
'low': 665.1101,
'close': 675.83,
'adjusted_close': 675.83,
'volume': 3631184,
'dividend_amount': 0,
'split_coefficient': 1}
Converting records to a dataframe, as desired:
timestamp | open | high | low | close | adjusted_close | volume | dividend_amount | split_coefficient | |
---|---|---|---|---|---|---|---|---|---|
0 | 2024-06-17 | 669.11 | 682.7099 | 665.1101 | 675.83 | 675.83 | 3631184 | 0 | 1.0 |
1 | 2024-06-14 | 655.05 | 675.5800 | 652.6400 | 669.38 | 669.38 | 4447116 | 0 | 1.0 |
2 | 2024-06-13 | 644.00 | 655.2500 | 642.3500 | 653.26 | 653.26 | 1863587 | 0 | 1.0 |
3 | 2024-06-12 | 652.21 | 655.7800 | 643.1100 | 650.06 | 650.06 | 2094381 | 0 | 1.0 |
4 | 2024-06-11 | 640.72 | 650.1900 | 640.5200 | 648.55 | 648.55 | 2167417 | 0 | 1.0 |
Here are some helper functions, to help you more easily interface with the gspread
package. Feel free to copy and adapt, as desired.
from gspread.exceptions import WorksheetNotFound
def find_or_create_sheet(sheet_name, doc=doc):
"""Access a sheet within the document, or create if not exists.
Params:
sheet_name (str): name of the sheet to access or create
doc (gspread.Spreadsheet): Google Sheets document object
Returns:
gspread.Worksheet: sheet object
"""
try:
sheet = doc.worksheet(sheet_name)
except WorksheetNotFound:
print("CREATING NEW SHEET...")
sheet = doc.add_worksheet(title=sheet_name, rows="10", cols="10")
return sheet
sheet_name = "daily-prices-nflx-copy"
sheet = find_or_create_sheet(sheet_name)
print("SHEET:", type(sheet), sheet.title)
SHEET: <class 'gspread.worksheet.Worksheet'> daily-prices-nflx-copy
def write_data_to_sheet(df, sheet):
"""Write dataframe contents to a given sheet.
Params:
df (pandas.DataFrame): dataframe you would like to write to sheet
sheet (gspread.Worksheet): sheet object you would like to write to
"""
header_row = df.columns.tolist()
rows = df.values.tolist()
assert len(header_row) == len(rows[0]) # same number of columns in all rows
all_rows = [header_row] + rows
sheet.clear()
sheet.update(all_rows)
write_data_to_sheet(df, sheet)
Verifying the data got written: