- Retry network operations one time if get HTTP code 400.
- Internal network code refactoring.
- Add missing
requirements.txt
dependency forh2
package. - Make parsing of malformed id ranges slightly more robust.
- Fix incorrect pluralization of an info message.
- Remove accidentally left-in invocation of
pdb
upon errors even if debugging not enabled. - Edit the README.md file slightly.
- In addition to the record pages,
eprints2archives
now also harvests general URLs from the server, including the top-level URL and/view
and 2 levels of pages underneath it. However, if a subset of records is requested, only gets those particular/view/X/N.html
pages rather than all pages under/view/X/
. - Internal changes allow it to use protocol HTTP/2, which was necessary to communicate with Archive.Today (because it appears to have stopped accepting save requests unless HTTP2 is used).
- Now tries to add
https://
orhttp://
if the user forgets to provide it, and also removes/eprint
and adds/rest
if needed. This makes it possible for the user to just provide a host name andeprints2archives
will figure out the rest. - Minor improvements to some of the run-time status messages.
- More progress bars!
- Improvements to debug logging.
- Improvements to README.md.
- Internal code refactoring.
- Include the top-level server URL among the URLs sent to archives, as well as
/view
and two levels of pages under/view
. - Make sure the set of URLs sent to archives is unique.
- Improve debug logging from low-level network module.
- Clarify some things in the README file.
First working version. Supports sending EPrints pages to the Internet Archive and Archive.Today. Runs with parallel threads and handles rate limits automatically. Currently implements a command-line interface only.