Archive

A simple python script to generate a sitemap of a given website and archive all the pages not already stored in the wayback machine. This now available to use an API as well!

Check it out the documentation here

Setup

$ git clone https://github.com/apurvmishra99/archiver.git
$ cd archiver
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Usage

Usage: archive.py [OPTIONS] URL

Options:
  -m, --max_urls INTEGER  The max number of urls to collect. The default value
                          is 50.Use 0 to set it as infinite.
  -d, --days INTEGER      The time difference(in days) of the current copy of
                          the page if it exists and we want to archive it
                          again. The default value is 7 days. Use 0 to archive
                          all pages again.
  --help                  Show this message and exit.

Example

$ python archive.py --days=7 --max_urls=50 https://apurvmishra.xyz

Alternative use

If you just want to scrape all the internal links on the website and write it to a txt file you can scrape_all_internal_links.py

Usage

Usage: scrape_all_internal_links.py [OPTIONS] URL

Options:
  --max_urls INTEGER  The max number of urls to collect. Use 0 to set it as
                      infinite.
  --help              Show this message and exit.

Example

$ python scrape_all_internal_links.py --max_urls=50 https://apurvmishra.xyz

TODO

Package the script
Convert to async
Add command line option to just generate the sitemap

Tested On

Pop!_OS 20.04 LTS
Python v3.7.6

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
archiver-api		archiver-api
archiver		archiver
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Archive

Setup

Usage

Example

Alternative use

Usage

Example

TODO

Tested On

About

Releases

Packages

Contributors 2

Languages

apurvmishra99/archiver

Folders and files

Latest commit

History

Repository files navigation

Archive

Setup

Usage

Example

Alternative use

Usage

Example

TODO

Tested On

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages