Advanced Integrations in Google Colab

One benefit of Google Colab notebooks, over generic Jupyter notebooks, is the ability to easily integrate with other Google products and services.

These integrations take advantage of the fact that in order to access a Colab notebook, you first need to be logged in with your Google Account. They then use credentials of the logged in user to access Google products and services on your behalf.

Google Drive

A Google Drive integration will allow us to read and write files from Google Drive.

We will first need to “mount” the Google Drive to the Colab filesystem, so we can access Drive files within Colab. When we mount the drive, we choose the name of a local subdirectory within the Colab filesystem (for example, “content/drive”) in which we would like to access the files:

from google.colab import drive

drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Now any files in your Google Drive are accessable from the notebook (for example, accessing a file called “daily-prices-nflx.csv” stored in the top level of the user’s Google Drive:

import os

csv_filepath = "/content/drive/MyDrive/daily-prices-nflx.csv"
print(os.path.isfile(csv_filepath))
True
from pandas import read_csv

df = read_csv(csv_filepath)
df.head()
timestamp open high low close adjusted_close volume dividend_amount split_coefficient
0 2024-06-17 669.11 682.7099 665.1101 675.83 675.83 3631184 0.0 1.0
1 2024-06-14 655.05 675.5800 652.6400 669.38 669.38 4447116 0.0 1.0
2 2024-06-13 644.00 655.2500 642.3500 653.26 653.26 1863587 0.0 1.0
3 2024-06-12 652.21 655.7800 643.1100 650.06 650.06 2094381 0.0 1.0
4 2024-06-11 640.72 650.1900 640.5200 648.55 648.55 2167417 0.0 1.0

Google Sheets

For a spreadsheet datastore integration, we can interface with Google Sheets directly.

We first need to create a Google Sheets document, for instance this example document.

Before proceeding, we observe the identifier of the Google Sheets document from the URL:

# document URL: https://docs.google.com/spreadsheets/d/1qbEagXHlGgFBiQWdUaL62V9f8MNW_t0QFmaQhoTiEjw/edit?usp=sharing
DOCUMENT_ID = "1qbEagXHlGgFBiQWdUaL62V9f8MNW_t0QFmaQhoTiEjw"

After noting the document identifier, we can use some boilerplate code to authenticate and access the document.

Authorization

Authorizing Google Colab to authenticate on behalf of the currently logged in user (assuming that user has access to the spreadsheet document you created):

from google.colab import auth

auth.authenticate_user()

Using credentials from the logged in user:

from google.auth import default

creds, _ = default()

Connecting to Google Sheets

Interfacing with Google Sheets, using the gspread package:

!pip list | grep gspread
gspread                          6.0.2
gspread-dataframe                3.3.1
import gspread

client = gspread.authorize(creds)

Accessing the document:

doc = client.open_by_key(DOCUMENT_ID)
print("DOC:", type(doc), doc.title)
DOC: <class 'gspread.spreadsheet.Spreadsheet'> CSV Data Files (Python for Finance Book) - PUBLIC

Listing worksheets in the document:

doc.worksheets()
[<Worksheet 'daily-prices-nflx' id:0>,
 <Worksheet 'daily-prices-nflx-copy' id:193822370>]

Accessing a specific sheet:

sheet = doc.worksheet("daily-prices-nflx")
print("SHEET:", sheet.title)
SHEET: daily-prices-nflx

Reading data from a sheet:

records = sheet.get_all_records()
print(sheet.title)
print("ROWS:", len(records))
records[0]
daily-prices-nflx
ROWS: 5555
{'timestamp': '2024-06-17',
 'open': 669.11,
 'high': 682.7099,
 'low': 665.1101,
 'close': 675.83,
 'adjusted_close': 675.83,
 'volume': 3631184,
 'dividend_amount': 0,
 'split_coefficient': 1}

Converting records to a dataframe, as desired:

from pandas import DataFrame

df = DataFrame(records)
df.head()
timestamp open high low close adjusted_close volume dividend_amount split_coefficient
0 2024-06-17 669.11 682.7099 665.1101 675.83 675.83 3631184 0 1.0
1 2024-06-14 655.05 675.5800 652.6400 669.38 669.38 4447116 0 1.0
2 2024-06-13 644.00 655.2500 642.3500 653.26 653.26 1863587 0 1.0
3 2024-06-12 652.21 655.7800 643.1100 650.06 650.06 2094381 0 1.0
4 2024-06-11 640.72 650.1900 640.5200 648.55 648.55 2167417 0 1.0

Helper Functions

Here are some helper functions, to help you more easily interface with the gspread package. Feel free to copy and adapt, as desired.

Find or Create Sheet

from gspread.exceptions import WorksheetNotFound

def find_or_create_sheet(sheet_name, doc=doc):
    """Access a sheet within the document, or create if not exists.

    Params:
        sheet_name (str): name of the sheet to access or create

        doc (gspread.Spreadsheet): Google Sheets document object

    Returns:
        gspread.Worksheet: sheet object
    """
    try:
        sheet = doc.worksheet(sheet_name)
    except WorksheetNotFound:
        print("CREATING NEW SHEET...")
        sheet = doc.add_worksheet(title=sheet_name, rows="10", cols="10")

    return sheet


sheet_name = "daily-prices-nflx-copy"
sheet = find_or_create_sheet(sheet_name)
print("SHEET:", type(sheet), sheet.title)
SHEET: <class 'gspread.worksheet.Worksheet'> daily-prices-nflx-copy

Writing Dataframes to Sheet

def write_data_to_sheet(df, sheet):
    """Write dataframe contents to a given sheet.

    Params:
        df (pandas.DataFrame): dataframe you would like to write to sheet

        sheet (gspread.Worksheet): sheet object you would like to write to
    """
    header_row = df.columns.tolist()
    rows = df.values.tolist()
    assert len(header_row) == len(rows[0]) # same number of columns in all rows
    all_rows = [header_row] + rows

    sheet.clear()
    sheet.update(all_rows)


write_data_to_sheet(df, sheet)

Verifying the data got written:

records = sheet.get_all_records()
print(sheet.title)
print(len(records))
records[0]
daily-prices-nflx-copy
5555
{'timestamp': '2024-06-17',
 'open': 669.11,
 'high': 682.7099,
 'low': 665.1101,
 'close': 675.83,
 'adjusted_close': 675.83,
 'volume': 3631184,
 'dividend_amount': 0,
 'split_coefficient': 1}