Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BadZipFile error #5

Merged
merged 2 commits into from
Dec 4, 2023

Conversation

belluzj
Copy link
Contributor

@belluzj belluzj commented Dec 1, 2023

Hello Simon, this is a reproducer for the issue here: googlefonts/diffenator2#97

It's a race condition when two process/instances of fontbakery/diffenator2 try to call youseedee at the same, there's a race for downloading and opening the ZIP.

The problem is on this line: https://github.com/simoncozens/youseedee/blob/master/lib/youseedee/__init__.py#L31, one thread will see the half-downloaded ZIP from the other and try to open it.

Possible mitigation would be to write a lock file next to the ZIP, that stays here until the download is finished. The process would be:

  1. Check if ZIP present
  2. Check if lock file present
    a) if yes, wait for it to go away
    b) if not, create it and download
  3. Open the ZIP

I can add that code on Monday if you'd like.

$ pytest
=================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.8.10, pytest-7.4.3, pluggy-1.3.0
rootdir: C:\UsersLocal\jany.belluz\Documents\code\youseedee
collected 1 item

tests\test_concurrent_download.py F                                                                                                                                                  [100%]

======================================================================================== FAILURES =========================================================================================
________________________________________________________________________________ test_concurrent_download _________________________________________________________________________________

tmp_path = WindowsPath('C:/Users/jany.belluz/AppData/Local/Temp/pytest-of-jany.belluz/pytest-749/test_concurrent_download0')

    def test_concurrent_download(tmp_path):
        """Download the ZIP twice at the same time and make sure it doesn't crash.
        """
        # Fake user home dir (where youseedee writes its files)
        os.environ['HOME'] = str(tmp_path)
        os.environ['USERPROFILE'] = str(tmp_path)

        with ThreadPoolExecutor(max_workers=2) as executor:
            future_1 = executor.submit(ucd_data, ord('a'))
            sleep(0.1)
            future_2 = executor.submit(ucd_data, ord('a'))
>           assert future_1.result() == future_2.result()

tests\test_concurrent_download.py:20:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Program Files\Python38\lib\concurrent\futures\_base.py:437: in result
    return self.__get_result()
C:\Program Files\Python38\lib\concurrent\futures\_base.py:389: in __get_result
    raise self._exception
C:\Program Files\Python38\lib\concurrent\futures\thread.py:57: in run
    result = self.fn(*self.args, **self.kwargs)
lib\youseedee\__init__.py:186: in ucd_data
    out.update(props["datareader"](file, codepoint))
lib\youseedee\__init__.py:82: in dictget
    fileentry["data"] = fileentry["reader"](filename)
lib\youseedee\__init__.py:67: in parse_file_semicolonsep
    ensure_files()
lib\youseedee\__init__.py:49: in ensure_files
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
C:\Program Files\Python38\lib\zipfile.py:1269: in __init__
    self._RealGetContents()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <zipfile.ZipFile [closed]>

    def _RealGetContents(self):
        """Read in the table of contents for the ZIP file."""
        fp = self.fp
        try:
            endrec = _EndRecData(fp)
        except OSError:
            raise BadZipFile("File is not a zip file")
        if not endrec:
>           raise BadZipFile("File is not a zip file")
E           zipfile.BadZipFile: File is not a zip file

C:\Program Files\Python38\lib\zipfile.py:1336: BadZipFile
---------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------
Downloading Unicode Character Database...
[==================================================]
================================================================================= short test summary info =================================================================================
FAILED tests/test_concurrent_download.py::test_concurrent_download - zipfile.BadZipFile: File is not a zip file
==================================================================================== 1 failed in 3.51s ====================================================================================

@belluzj belluzj changed the title Add test that shows the problem Fix BadZipFile error Dec 4, 2023
@belluzj
Copy link
Contributor Author

belluzj commented Dec 4, 2023

I added a fix with a lock file for the ZIP, using this package seemingly from the Tox team: https://pypi.org/project/filelock/

@simoncozens
Copy link
Owner

Thanks so much for this!

@simoncozens simoncozens merged commit 5deffef into simoncozens:master Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants