This script fetches transcripts of videos from specified YouTube channels or playlists and saves them as .txt files. Optionally, it can also download the corresponding videos.
It can fetch private videos if access is granted to Google Service Accounts on a private video by private video basis. Unfortunately, there is no direct way of scaling the download process to private playlists as of the writing date.
- A valid YouTube Data API key.
- Optional: Google service account to download private videos
- Python packages:
asyncio
,itertools
,os
,argparse
,youtube_transcript_api
,googleapiclient
,dotenv
, andpytube
.
-
Clone this repository:
git clone https://github.com/vmeylan/youtube_video_and_transcript_downloader.git
cd youtube_video_and_transcript_downloader
-
Run the setup.sh script to install the required Python packages and setup the .env file in the root directory.
./setup.sh
-
Replace 'YOUR_YOUTUBE_API_KEY' with your actual YouTube Data API key and adjust other parameters as needed.
-
Populate channels IDs in
YOUTUBE_CHANNELS
or playlists IDsYOUTUBE_PLAYLISTS
separated by commas.
To use the Google Service Account for downloading private YouTube videos, you'll need to grant the Service Account email access to each private video individually. As of now, there's no direct way to download private playlists or whole private channels using this method.
-
Navigate to Google Cloud Console.
-
Create a new project or select an existing one.
-
Navigate to
IAM & Admin
>Service accounts
, then click onCreate Service Account
. -
Fill in the details and click
Create
. -
In the
Service account permissions
section, you will be prompted to grant this service account access to specific roles. To work with the YouTube Data API, you would generally grant it the role of "Viewer". This allows the account to view, but not modify, most of the resources associated with the account.a. Click on the Select a role dropdown.
b. Navigate to
YouTube Data API v3
>YouTube Data API Viewer
.c. Click
Continue
to proceed. -
Click
Continue
and thenDone
. -
In the list of service accounts, find the one you just created and click on the three dots under
Actions
>Manage keys
. -
Click
Add key
and chooseJSON
. This will download a JSON file which is your service account credentials. -
Store this JSON file safely and note the path. This path will be used as the
SERVICE_ACCOUNT_FILE
in the.env
file.
-
Go to YouTube Studio.
-
Click on "Videos" on the left sidebar.
-
Select the private video you want to share.
-
Under the "Visibility" settings, you'll see an option to share the video with specific email addresses if it's set to "Private".
-
Add the service account's email address (you can find this in the Google Cloud Console under your service account details).
-
Save or confirm the changes.
-
Remember, you'll need to repeat the steps for each private video you wish to download.
To fetch transcripts and optionally download videos, run the following command:
python script_name.py --api_key 'YOUR_YOUTUBE_API_KEY' --channels 'channel_name_1 channel_name_2 ...' --playlists 'playlist_id_1 playlist_id_2 ...'
'YOUR_YOUTUBE_API_KEY'
: Replace this with your YouTube Data API key.'channel_name_1 channel_name_2 ...'
: A space-separated list of YouTube channel names or IDs.'playlist_id_1 playlist_id_2 ...'
: A space-separated list of YouTube playlist IDs.
Alternatively, if you've set up the .env
file as instructed above, you can simply run: python youtube_download.py
This will use the API key and other parameters specified in the .env
file.
- Transcripts are saved in a directory named
data
, with subdirectories for each channel or playlist. Each video's transcript is saved as a .txt file named using the video's title. - The script uses the
youtube_transcript_api
to fetch transcripts. If a transcript is not available for a particular video, the script will skip that video and continue with the next. - By setting the
DOWNLOAD_VIDEO
environment variable toTrue
, the script will also download the videos corresponding to the transcripts. Videos are saved as .mp4 files in the same directories as the transcripts.
If you have suggestions or improvements, feel free to submit a pull request or open an issue.