31  YouTube Integrations

31.1 Background

This notebook contains an abbreviated demonstration of some of the audio processing methods used by the prof to train machine learning models for music recommendation purposes (see Slides and/or “Extra Credit” Promotional Video).

In this notebook we will fetch and process audio data from YouTube.

You will have an opportunity to specify your own YouTube video URL of interest for this exercise.

One cool thing about Google Colab is it has a nice mix of inputs and rich outputs, such as the ability to display images and play audio:

%%capture
!wget "https://github.com/s2t2/ml-music-2023/raw/main/test/audio/pop.00032.wav"
from IPython.display import display, Audio, Image

display(Audio("pop.00032.wav", autoplay=False))

31.2 Fetching Video Info

We will use the pytubefix package to interface with the YouTube API.

Installing the pytubefix package:

%%capture
!pip install pytubefix
!pip list | grep pytubefix
pytubefix                          8.10.2

Specifying a YouTube Video URL:

# waits for you to input a video URL and press enter, uses a default video if you just press enter
video_url = input("Please input a YouTube Video URL: " ) or "https://www.youtube.com/watch?v=q6HiZIQoLSU"
video_url
Please input a YouTube Video URL: 
'https://www.youtube.com/watch?v=q6HiZIQoLSU'

Fetching video information / metadata:

# see: https://pytubefix.readthedocs.io/en/latest/
from pytubefix import YouTube as Video
from pytubefix.cli import on_progress

video = Video(video_url, on_progress_callback=on_progress)
print("VIDEO:", video.video_id, video.watch_url)
print("AUTHOR:", video.author)
print("TITLE:", video.title)
print("LENGTH:", video.length)
print("VIEWS:", video.views)
VIDEO: q6HiZIQoLSU https://youtube.com/watch?v=q6HiZIQoLSU
AUTHOR: MaggieRogersVEVO
TITLE: Maggie Rogers - The Knife (Live On Austin City Limits)
LENGTH: 272
VIEWS: 157544
from IPython.display import display, Image

image_url = video.thumbnail_url
display(Image(url=image_url, height=250))

31.3 Filtering Streams

Identifying audio streams available for this video:

len(video.streams)
24
# see: https://pytubefix.readthedocs.io/en/latest/user/streams.html#filtering-streams

audio_streams = video.streams.filter(only_audio=True, file_extension='mp4').order_by("abr").asc()
print(len(audio_streams))
audio_streams
2
[<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">]
stream = audio_streams.first()
stream
<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">

31.4 Downloading Audio

Downloading the audio file into the Colab filesystem (see “Files” menu in left sidebar):

audio_filepath = stream.download(skip_existing=True)
audio_filepath
'/content/Maggie Rogers - The Knife (Live On Austin City Limits).m4a'

Using the “Files” menu in the left sidebar, you should now be able to right click on the audio file to download it onto your local machine and add to your music collection. 😸

Playing audio in Colab:

from IPython.display import display, Audio

display(Audio(audio_filepath, autoplay=False))

31.5 Cutting Tracks

Splitting the audio file into 30 second chunks (this step is required for processing the audio data for machine learning and music recommendation purposes).

The take-away is that audio is just some data that we can access and manipulate (for example splitting into different parts).

%%capture
!pip install librosa
!pip list | grep librosa
librosa                            0.10.2.post1
import warnings

warnings.filterwarnings('ignore')
import librosa

audio, sample_rate = librosa.load(audio_filepath)
print("AUDIO DATA:", type(audio), audio.shape) #> ~6M datapoints
print("SAMPLE RATE:", sample_rate) # 22050 datapoints per second
AUDIO DATA: <class 'numpy.ndarray'> (5998592,)
SAMPLE RATE: 22050
Code
# AUDIO PROCESSING HELPER FUNCTIONS

import numpy as np

def split_into_batches(my_list, batch_size=10_000):
    """Splits a list into evenly sized batches."""
    # h/t: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
    for i in range(0, len(my_list), batch_size):
        yield my_list[i : i + batch_size]


def make_tracks(audio:np.array, sample_rate:int, track_length_seconds=30, discard_last=True):
    """Returns equal sized tracks of the given duration.
        Discards the last track, because it will have a different duration.
    """
    track_length = track_length_seconds * sample_rate
    #print("TRACK LENGTH:", track_length) #> 661_500 for 30s with a sample rate of 22_050 per second

    all_tracks = list(split_into_batches(audio.tolist(), batch_size=track_length))
    #print(f"ALL TRACKS ({len(all_tracks)}):", [len(t) for t in all_tracks])

    if discard_last:
        return all_tracks[0:-1] # not including the last item in the list
    else:
        return all_tracks
from IPython.display import Audio, display

tracks = make_tracks(audio=audio, sample_rate=sample_rate, track_length_seconds=30)
print("TRACKS:", len(tracks))

for i, track in enumerate(tracks):
    print("-----------")
    print(f"TRACK {i+1}...")
    print(len(track))

    audio_data = np.array(track)
    display(Audio(audio_data, autoplay=False, rate=sample_rate)) # rate only necessary when passing custom audio data
TRACKS: 9
-----------
TRACK 1...
661500
-----------
TRACK 2...
661500
-----------
TRACK 3...
661500
-----------
TRACK 4...
661500
-----------
TRACK 5...
661500
-----------
TRACK 6...
661500
-----------
TRACK 7...
661500
-----------
TRACK 8...
661500
-----------
TRACK 9...
661500

If you listen to each of these 30 second tracks, you will observe one starts where the other left off. This proves we were able to segment the song programmatically.

There are more data science related steps involved in creating a music recommendation system, but this demo should at least provide a good foundation for how to get started with accessing and processing data from YouTube.