[http- unzip_http] handle errors in retrieving URLs #2655

midichef · 2024-12-26T20:28:49Z

This PR catches a few more error cases when retrieving URLs. Every new case handled is a situation I triggered in testing.

So for example, the stack trace on a nonexistent zipfile from a 404 with a particular server configuration:

        File "/home/midichef/.venv/lib/python3.12/site-packages/visidata/loaders/unzip_http.py", line 158, in infoiter
            self.zip_size = int(resp.headers['Content-Length'])
        File "/home/midichef/.venv/lib/python3.12/site-packages/urllib3/_collections.py", line 260, in __getitem__
            val = self._container[key.lower()]
        KeyError: 'content-length'

becomes cannot open URL: status code 404.

The web servers that host data tend to be poorly maintained, so visidata users will run into such errors unusually often.

Tested on Python 3.8 and Python 3.12 to make sure the Exception classes exist in the oldest and newest libraries.

saulpw · 2024-12-26T23:47:21Z

visidata/loaders/unzip_http.py

@@ -149,7 +149,13 @@ def namelist(self):
        return list(r.filename for r in self.infoiter())

    def infoiter(self):
-        resp = self.http.request('HEAD', self.url)
+        urllib3 = vd.importExternal('urllib3')


This file (unzip_http.py) should be taken directly from https://github.com/saulpw/unzip-http/blob/master/unzip_http.py, with the vd.importExternal line being manually substituted. It's not a clean process but I didn't/don't expect unzip_http to be updated that often. They've already drifted a bit, but if we could bring them into sync again and make these changes in a way that keeps them as close as possible, that's my preference. Probably the easiest way is to let these errors percolate up and then handle them at a visidata layer instead of in unzip_http.

One complication is, unzip-http uses urllib3, so the exceptions would be from urllib3 and can't be caught in loader/archive.py which does not import it.

I submitted a draft patch that transforms the urllib3 errors into regular urllib errors. What do you think of that? It makes sense in the context of visidata, but it's quite odd in the context of unzip-http by itself.

Can we just catch Exception? Do we only want to catch urllib3 exceptions here?

saulpw reviewed Dec 26, 2024

View reviewed changes

anjakefala added the waiting on contributor label Dec 27, 2024

midichef added 2 commits December 27, 2024 12:14

[http-] handle more errors opening URLs

8143b11

[unzip-http] handle errors opening URLs

21d078b

midichef force-pushed the http_error_handling branch from ce2e971 to 21d078b Compare December 27, 2024 20:16

anjakefala added waiting on maintainer waiting on contributor and removed waiting on contributor waiting on maintainer labels Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[http- unzip_http] handle errors in retrieving URLs #2655

[http- unzip_http] handle errors in retrieving URLs #2655

midichef commented Dec 26, 2024

saulpw Dec 26, 2024

midichef Dec 27, 2024

saulpw Jan 13, 2025

[http- unzip_http] handle errors in retrieving URLs #2655

Are you sure you want to change the base?

[http- unzip_http] handle errors in retrieving URLs #2655

Conversation

midichef commented Dec 26, 2024

saulpw Dec 26, 2024

Choose a reason for hiding this comment

midichef Dec 27, 2024

Choose a reason for hiding this comment

saulpw Jan 13, 2025

Choose a reason for hiding this comment