%%capture
!wget "https://github.com/s2t2/ml-music-2023/raw/main/test/audio/pop.00032.wav"
31 YouTube Integrations
31.1 Background
This notebook contains an abbreviated demonstration of some of the audio processing methods used by the prof to train machine learning models for music recommendation purposes (see Slides and/or “Extra Credit” Promotional Video).
In this notebook we will fetch and process audio data from YouTube.
You will have an opportunity to specify your own YouTube video URL of interest for this exercise.
One cool thing about Google Colab is it has a nice mix of inputs and rich outputs, such as the ability to display images and play audio:
from IPython.display import display, Audio, Image
"pop.00032.wav", autoplay=False)) display(Audio(
31.2 Fetching Video Info
We will use the pytubefix
package to interface with the YouTube API.
Installing the pytubefix
package:
%%capture
!pip install pytubefix
!pip list | grep pytubefix
pytubefix 8.10.2
Specifying a YouTube Video URL:
# waits for you to input a video URL and press enter, uses a default video if you just press enter
= input("Please input a YouTube Video URL: " ) or "https://www.youtube.com/watch?v=q6HiZIQoLSU"
video_url video_url
Please input a YouTube Video URL:
'https://www.youtube.com/watch?v=q6HiZIQoLSU'
Fetching video information / metadata:
# see: https://pytubefix.readthedocs.io/en/latest/
from pytubefix import YouTube as Video
from pytubefix.cli import on_progress
= Video(video_url, on_progress_callback=on_progress)
video print("VIDEO:", video.video_id, video.watch_url)
print("AUTHOR:", video.author)
print("TITLE:", video.title)
print("LENGTH:", video.length)
print("VIEWS:", video.views)
VIDEO: q6HiZIQoLSU https://youtube.com/watch?v=q6HiZIQoLSU
AUTHOR: MaggieRogersVEVO
TITLE: Maggie Rogers - The Knife (Live On Austin City Limits)
LENGTH: 272
VIEWS: 157544
from IPython.display import display, Image
= video.thumbnail_url
image_url =image_url, height=250)) display(Image(url

31.3 Filtering Streams
Identifying audio streams available for this video:
len(video.streams)
24
# see: https://pytubefix.readthedocs.io/en/latest/user/streams.html#filtering-streams
= video.streams.filter(only_audio=True, file_extension='mp4').order_by("abr").asc()
audio_streams print(len(audio_streams))
audio_streams
2
[<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">]
= audio_streams.first()
stream stream
<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">
31.4 Downloading Audio
Downloading the audio file into the Colab filesystem (see “Files” menu in left sidebar):
= stream.download(skip_existing=True)
audio_filepath audio_filepath
'/content/Maggie Rogers - The Knife (Live On Austin City Limits).m4a'
Using the “Files” menu in the left sidebar, you should now be able to right click on the audio file to download it onto your local machine and add to your music collection. 😸
Playing audio in Colab:
from IPython.display import display, Audio
=False)) display(Audio(audio_filepath, autoplay
31.5 Cutting Tracks
Splitting the audio file into 30 second chunks (this step is required for processing the audio data for machine learning and music recommendation purposes).
The take-away is that audio is just some data that we can access and manipulate (for example splitting into different parts).
%%capture
!pip install librosa
!pip list | grep librosa
librosa 0.10.2.post1
import warnings
'ignore') warnings.filterwarnings(
import librosa
= librosa.load(audio_filepath)
audio, sample_rate print("AUDIO DATA:", type(audio), audio.shape) #> ~6M datapoints
print("SAMPLE RATE:", sample_rate) # 22050 datapoints per second
AUDIO DATA: <class 'numpy.ndarray'> (5998592,)
SAMPLE RATE: 22050
Code
# AUDIO PROCESSING HELPER FUNCTIONS
import numpy as np
def split_into_batches(my_list, batch_size=10_000):
"""Splits a list into evenly sized batches."""
# h/t: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
for i in range(0, len(my_list), batch_size):
yield my_list[i : i + batch_size]
def make_tracks(audio:np.array, sample_rate:int, track_length_seconds=30, discard_last=True):
"""Returns equal sized tracks of the given duration.
Discards the last track, because it will have a different duration.
"""
= track_length_seconds * sample_rate
track_length #print("TRACK LENGTH:", track_length) #> 661_500 for 30s with a sample rate of 22_050 per second
= list(split_into_batches(audio.tolist(), batch_size=track_length))
all_tracks #print(f"ALL TRACKS ({len(all_tracks)}):", [len(t) for t in all_tracks])
if discard_last:
return all_tracks[0:-1] # not including the last item in the list
else:
return all_tracks
from IPython.display import Audio, display
= make_tracks(audio=audio, sample_rate=sample_rate, track_length_seconds=30)
tracks print("TRACKS:", len(tracks))
for i, track in enumerate(tracks):
print("-----------")
print(f"TRACK {i+1}...")
print(len(track))
= np.array(track)
audio_data =False, rate=sample_rate)) # rate only necessary when passing custom audio data display(Audio(audio_data, autoplay
TRACKS: 9
-----------
TRACK 1...
661500
-----------
TRACK 2...
661500
-----------
TRACK 3...
661500
-----------
TRACK 4...
661500
-----------
TRACK 5...
661500
-----------
TRACK 6...
661500
-----------
TRACK 7...
661500
-----------
TRACK 8...
661500
-----------
TRACK 9...
661500
If you listen to each of these 30 second tracks, you will observe one starts where the other left off. This proves we were able to segment the song programmatically.
There are more data science related steps involved in creating a music recommendation system, but this demo should at least provide a good foundation for how to get started with accessing and processing data from YouTube.