The Internet Archive's ⏳Wayback Machine interface for Crystal.
-
Add the dependency to your
shard.yml
:dependencies: wayback: github: lucasintel/wayback.cr
-
Run
shards install
Wayback.latest_snapshot("https://kyivindependent.com/")
Wayback.snapshots("https://kyivindependent.com/", from: 1.month.ago, to: Time.local)
Wayback.snapshots("https://kyivindependent.com/*", latest: 10)
Wayback.first_snapshot("https://kyivindependent.com/")
The library provides a straightforward interface for building complex queries as well.
Wayback::Query.url("https://kyivindependent.com/*")
.from(1.year.ago)
.to(Time.local)
.mime_type(/image\/*./)
.status_not(301)
.status_not(404)
.latest(10)
Wayback.perform(query)
Aggregate snapshots based on a field, or a substring (position
) of a field.
# 📰 news headlines by day
query = Wayback::Query.url("https://kyivindependent.com/").group_by_day
Wayback.perform(query)
It's possible to track how many snapshots were skipped due to filtering and
grouping by including #with_aggregate_count
in the chain. This feature is optional
since it might slow down the query.
# Unique captures per URL.
query = Wayback::Query.url("https://kyivindependent.com/*").group_by_url.with_aggregate_count
snapshots = Wayback.perform(query)
snapshots.first.url
# => https://kyivindependent.com/opinion/natalia-datskevych-my-rescue-mission-to-flee-russias-war-with-three-kids/
snapshots.first.aggregate_count
# => 5
params = {
"url" => "https://kyivindependent.com/",
"filter" => "!status:404",
"collapse" => "timestamp:10",
}
Wayback.execute(params)
- Wayback Machine CDX-Server API
- A Sustainable, Large-Scale, Minimal Approach to Accessing Web Archives (Greg Wiedeman, archivist at the University At Albany)
- Wayback Machine APIs
- The CDX File Format
- Fork it (https://github.com/lucasintel/wayback.cr/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
- Lucas - creator and maintainer