🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Updated
Sep 19, 2024 - Python
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Core Python Web Archiving Toolkit for replay and recording of web archives
Collect and revisit web pages.
The repository and website hosting the peer review process for new Programming Historian lessons
Run a high-fidelity browser-based web archiving crawler in a single Docker container
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
Streaming WARC/ARC library for fast web archive IO
Serverless replay of web archives directly in the browser
Automatically archive links to videos, images, and social media content from Google Sheets (and more).
Archiveror will help you preserve the webpages you love. 💾
A Tool To Push Web Resources Into Web Archives
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
ODU Web Science and Digital Libraries Research Group (WS-DL) home page.
Wayback Machine API interface & a command-line tool
🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."