URLHero is a link resolver for current and defunct URL shorteners. It uses link mappings from URLTeam archives, dumps provided by shortener operators, and links captured by the Internet Archive.
- Automatically download and process daily URLTeam releases.
- Hopefully gain access to 301Works dumps.
- Switch to a torrent client that can scale to handle 1500 webseed items. anacrolix/torrent has less mature webseed support and is relatively slow. Transmission was unable to handle all torrents, in simple tests.
- Support Internet Archive API authentication. For example, URLTeamTorrentRelease2013July can only be downloaded when signed in.
- Create link resolving website and API.
- Create Web Extension that redirects dead short links using URLHero.
- Proxy unknown shortener requests and contribute back to URLTeam dataset.
- Possibly fork unshort.link.
- Process URLTeam first-generation TinyBack releases.
- Write custom CSV parser for qr-cx datasets to handle unescaped quotes.
- Full BEACON format spec compliance.
- Find a relational or key-value database with efficient compression.
There are many ways to contribute:
- File an issue or PR to submit a feature or bug report.
- Send link mappings for a URL shortener that you operate or have archived.
- Join URLTeam and help us archive at-risk shorteners by running the terroroftinytown project in Docker or via the Archive Team Warrior.
If you want to get in touch, join the #urlteam channel on hackint or email me.
This project is made available under the Mozilla Public License, v. 2.0.