Internetarchive-YouTube

🚀 GitHub Action and CLI to archive YouTube channels by uploading the channel's videos to archive.org.

🧑‍💻 To use this tool as a command line interface (CLI), jump to CLI: Getting Started.
⚡️ To use this tool as a GitHub Action, jump to GitHub Action: Getting Started.

CLI: Getting Started 🧑‍💻

Requirements:

🐍 Python>=3.7

⬇️ Installation:

pip install internetarchive-youtube

# Install and login to internetarchive
pip install internetarchive
ia configure

🗃️ Backend database:

Create a backend database (or JSON bin) to track the download/upload overall progress.
If you choose MongoDB, export the connection string as an environment variable:

export MONGODB_CONNECTION_STRING=mongodb://username:password@host:port

If you choose JSONBin, export the master key as an environment variable:

export JSONBIN_KEY=xxxxxxxxxxxxxxxxx

⌨️ Usage:

usage: ia-yt [-h] [-p PRIORITIZE] [-s SKIP_LIST] [-f] [-t TIMEOUT] [-n] [-a] [-c CHANNELS_FILE] [-S] [-C]

optional arguments:
  -h, --help            show this help message and exit
  -p PRIORITIZE, --prioritize PRIORITIZE
                        Comma-separated list of channel names to prioritize
                        when processing videos
  -s SKIP_LIST, --skip-list SKIP_LIST
                        Comma-separated list of channel names to skip
  -f, --force-refresh   Refresh the database after every video (Can slow down
                        the workflow significantly, but is useful when running
                        multiple concurrent jobs)
  -t TIMEOUT, --timeout TIMEOUT
                        Kill the job after n hours (default: 5.5)
  -n, --no-logs         Don't print any log messages
  -a, --add-channel     Add a channel interactively to the list of channels to
                        archive
  -c CHANNELS_FILE, --channels-file CHANNELS_FILE
                        Path to the channels list file to use if the
                        environment variable `CHANNELS` is not set (default:
                        ~/.yt_channels.txt)
  -S, --show-channels   Show the list of channels in the channels file
  -C, --create-collection
                        Creates/appends to the backend database from the
                        channels list

GitHub Action: Getting Started ⚡️

Fork this repository.
Create a backend database (or JSON bin).
Add your Archive.org credentials to the repository's Actions secrets:

ARCHIVE_USER_EMAIL
ARCHIVE_PASSWORD

Add a list of the channels you want to archive to the repository's Actions secrets:

The CHANNELS secret should be formatted like this example:

CHANNEL_NAME: CHANNEL_URL
FOO: CHANNEL_URL
FOOBAR: CHANNEL_URL
SOME_CHANNEL: CHANNEL_URL

Don't add any quotes around the name or the URL, and make sure to keep one space between the colon and the URL.

Add the database secret(s) to the repository's Actions secrets:

If you picked option 1 (MongoDB), add this additional secret:

MONGODB_CONNECTION_STRING

If you picked option 2 (JSON bin), add this additional secret:

JSONBIN_KEY

Run the workflow under Actions manually with a workflow_dispatch, or wait for it to run automatically.

That's it!

🏗️ Creating A Backend Database

Option 1: MongoDB (recommended).
- Self-hosted (see: Alyetama/quick-MongoDB or dockerhub image).
- Free database on Atlas.
Option 2: JSON bin (if you want a quick start).
- Sign up to JSONBin here.
- Click on VIEW MASTER KEY, then copy the key.

📝 Notes

Information about the MONGODB_CONNECTION_STRING can be found here.
Jobs can run for a maximum of 6 hours, so if you're archiving a large channel, the job might die, but it will resume in a new job when it's scheduled to run.
Instead of raw text, you can pass a file path or a file URL with a list of channels formatted as CHANNEL_NAME: CHANNEL_URL or in JSON format {"CHANNEL_NAME": "CHANNEL_URL"}.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
internetarchive_youtube		internetarchive_youtube
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Internetarchive-YouTube

CLI: Getting Started 🧑‍💻

Requirements:

⬇️ Installation:

🗃️ Backend database:

⌨️ Usage:

GitHub Action: Getting Started ⚡️

🏗️ Creating A Backend Database

📝 Notes

About

Releases

Packages

Languages

Biodiversity-CatTracker2/internetarchive-youtube

Folders and files

Latest commit

History

Repository files navigation

Internetarchive-YouTube

CLI: Getting Started 🧑‍💻

Requirements:

⬇️ Installation:

🗃️ Backend database:

⌨️ Usage:

GitHub Action: Getting Started ⚡️

🏗️ Creating A Backend Database

📝 Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages