Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed exception on windows when contention happens in file cache #2787

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gshimansky
Copy link
Contributor

Catch exception that happens in file cache on Windows when two or more processes attempt to write the same file that is already open by some other process, and therefore cannot be written because it is locked (that's how files work on Windows).

Fixed #2777 .

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because all unit tests test this functionality when run with xdist.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@gshimansky gshimansky marked this pull request as draft November 21, 2024 17:00
Copy link
Contributor

@pbchekin pbchekin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, potentially can be upstreamed.

@gshimansky
Copy link
Contributor Author

While this patch fixes the main source of PermissionError exceptions in FileCacheManager.put function I now see rare occurrences of PermissionError when Triton code tries to read files from the cache. They are possibly unrelated to this particular fix and happened previously too, just much less frequently.
I've converted PR to a draft while I continue to investigate and look for a complete fix of xdist tests on windows.

Signed-off-by: Gregory Shimansky <gregory.shimansky@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WIN] Parallel tests execution fails because of locked files
2 participants