%%capture
!wget "https://github.com/s2t2/ml-music-2023/raw/main/test/audio/pop.00032.wav"39 YouTube Integrations
39.1 Background
This notebook contains an abbreviated demonstration of some of the audio processing methods used by the prof to train machine learning models for music recommendation purposes (see Slides and/or “Extra Credit” Promotional Video).
In this notebook we will fetch and process audio data from YouTube.
You will have an opportunity to specify your own YouTube video URL of interest for this exercise.
One cool thing about Google Colab is it has a nice mix of inputs and rich outputs, such as the ability to display images and play audio:
from IPython.display import display, Audio, Image
display(Audio("pop.00032.wav", autoplay=False))39.2 Fetching Video Info
We will use the pytubefix package to interface with the YouTube API.
Installing the pytubefix package:
%%capture
!pip install pytubefix!pip list | grep pytubefixpytubefix 8.10.2
Specifying a YouTube Video URL:
# waits for you to input a video URL and press enter, uses a default video if you just press enter
video_url = input("Please input a YouTube Video URL: " ) or "https://www.youtube.com/watch?v=q6HiZIQoLSU"
video_urlPlease input a YouTube Video URL:
'https://www.youtube.com/watch?v=q6HiZIQoLSU'
Fetching video information / metadata:
# see: https://pytubefix.readthedocs.io/en/latest/
from pytubefix import YouTube as Video
from pytubefix.cli import on_progress
video = Video(video_url, on_progress_callback=on_progress)
print("VIDEO:", video.video_id, video.watch_url)
print("AUTHOR:", video.author)
print("TITLE:", video.title)
print("LENGTH:", video.length)
print("VIEWS:", video.views)VIDEO: q6HiZIQoLSU https://youtube.com/watch?v=q6HiZIQoLSU
AUTHOR: MaggieRogersVEVO
TITLE: Maggie Rogers - The Knife (Live On Austin City Limits)
LENGTH: 272
VIEWS: 157544
from IPython.display import display, Image
image_url = video.thumbnail_url
display(Image(url=image_url, height=250))
39.3 Filtering Streams
Identifying audio streams available for this video:
len(video.streams)24
# see: https://pytubefix.readthedocs.io/en/latest/user/streams.html#filtering-streams
audio_streams = video.streams.filter(only_audio=True, file_extension='mp4').order_by("abr").asc()
print(len(audio_streams))
audio_streams2
[<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">]
stream = audio_streams.first()
stream<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">
39.4 Downloading Audio
Downloading the audio file into the Colab filesystem (see “Files” menu in left sidebar):
audio_filepath = stream.download(skip_existing=True)
audio_filepath'/content/Maggie Rogers - The Knife (Live On Austin City Limits).m4a'
Using the “Files” menu in the left sidebar, you should now be able to right click on the audio file to download it onto your local machine and add to your music collection. 😸
Playing audio in Colab:
from IPython.display import display, Audio
display(Audio(audio_filepath, autoplay=False))39.5 Cutting Tracks
Splitting the audio file into 30 second chunks (this step is required for processing the audio data for machine learning and music recommendation purposes).
The take-away is that audio is just some data that we can access and manipulate (for example splitting into different parts).
%%capture
!pip install librosa!pip list | grep librosalibrosa 0.10.2.post1
import warnings
warnings.filterwarnings('ignore')import librosa
audio, sample_rate = librosa.load(audio_filepath)
print("AUDIO DATA:", type(audio), audio.shape) #> ~6M datapoints
print("SAMPLE RATE:", sample_rate) # 22050 datapoints per secondAUDIO DATA: <class 'numpy.ndarray'> (5998592,)
SAMPLE RATE: 22050
Code
# AUDIO PROCESSING HELPER FUNCTIONS
import numpy as np
def split_into_batches(my_list, batch_size=10_000):
"""Splits a list into evenly sized batches."""
# h/t: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
for i in range(0, len(my_list), batch_size):
yield my_list[i : i + batch_size]
def make_tracks(audio:np.array, sample_rate:int, track_length_seconds=30, discard_last=True):
"""Returns equal sized tracks of the given duration.
Discards the last track, because it will have a different duration.
"""
track_length = track_length_seconds * sample_rate
#print("TRACK LENGTH:", track_length) #> 661_500 for 30s with a sample rate of 22_050 per second
all_tracks = list(split_into_batches(audio.tolist(), batch_size=track_length))
#print(f"ALL TRACKS ({len(all_tracks)}):", [len(t) for t in all_tracks])
if discard_last:
return all_tracks[0:-1] # not including the last item in the list
else:
return all_tracksfrom IPython.display import Audio, display
tracks = make_tracks(audio=audio, sample_rate=sample_rate, track_length_seconds=30)
print("TRACKS:", len(tracks))
for i, track in enumerate(tracks):
print("-----------")
print(f"TRACK {i+1}...")
print(len(track))
audio_data = np.array(track)
display(Audio(audio_data, autoplay=False, rate=sample_rate)) # rate only necessary when passing custom audio dataTRACKS: 9
-----------
TRACK 1...
661500
-----------
TRACK 2...
661500
-----------
TRACK 3...
661500
-----------
TRACK 4...
661500
-----------
TRACK 5...
661500
-----------
TRACK 6...
661500
-----------
TRACK 7...
661500
-----------
TRACK 8...
661500
-----------
TRACK 9...
661500
If you listen to each of these 30 second tracks, you will observe one starts where the other left off. This proves we were able to segment the song programmatically.
There are more data science related steps involved in creating a music recommendation system, but this demo should at least provide a good foundation for how to get started with accessing and processing data from YouTube.