-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose os.DirEntry
objects from pathlib
#125413
Comments
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
Add a `Path.dir_entry` attribute. In any path object generated by `Path.iterdir()`, it stores an `os.DirEntry` object corresponding to the path; in other cases it is `None`. This can be used to retrieve the file type and attributes of directory children without necessarily incurring further system calls. Under the hood, we use `dir_entry` in our implementations of `PathBase.glob()`, `PathBase.walk()` and `PathBase.copy()`, the last of which also provides the implementation of `Path.copy()`, resulting in a modest speedup when copying local directory trees.
I put this feedback on the PR, but it's probably better placed here: while I like the general idea, I don't think this specific API is the right way to do it.
I think we can eliminate both of those bits of awkwardness:
If it's impractical to add |
… once Improve `pathlib._abc.PathBase.copy()` (which provides `Path.copy()`) by fetching operands' supported metadata keys up-front, rather than once for each path in the tree. This prepares the way for using `os.DirEntry` objects in `copy()`.
pathlib.Path.dir_entry
os.DirEntry
objects from pathlib
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. In the private `pathlib._abc.PathBase` class, we can rework the `iterdir()`, `glob()`, `walk()` and `copy()` methods to call `scandir()` and make use of cached directory entry information, and thereby improve performance. Because the `Path.copy()` method is provided by `PathBase`, this also speeds up traversal when copying local files and directories.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
To tie up the above loose ends, we went with a |
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Use the new `PathBase.scandir()` method in `PathBase.copy()`, which greatly reduces the number of `PathBase.stat()` calls needed when copying. This also speeds up `Path.copy()`, which inherits the superclass implementation. Under the hood, we use directory entries to distinguish between files, directories and symlinks, and to retrieve a `stat_result` when reading metadata. This logic is extracted into a new `pathlib._abc.CopierBase` class, which helps reduce the number of underscore-prefixed support methods in the path interface.
Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721
Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721 Co-authored-by: Steve Dower <steve.dower@microsoft.com>
When a path object is generated by `PathBase.iterdir()`, then its `_info` attribute now stores a `os.DirEntry`-like object that can be used to query the file type. This removes any need for a `_scandir()` method. Currently the `_info` attribute is private and only guaranteed to be populated in paths from `iterdir()`. Later on, I'm hoping to rename it to `info` and ensure that it's populated for all kinds of paths (this probably involves adding a `pathlib.FileInfo` class.) In the pathlib ABCs, `info` will replace `stat()` as the lowest-level abstract file status querying mechanism.
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. This will be used to implement several `PathBase` methods more efficiently, including methods that provide `Path.copy()`.
…ython#126261) Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
…ython#126262) Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API, with its 10 mandatory fields and low-level types, makes it a poor fit for virtual filesystems. We'll look to add a `PathBase.info` attribute later - see pythonGH-125413.
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API, with its 10 mandatory fields and low-level types, makes it an awkward fit for virtual filesystems. We'll look to add a `PathBase.info` attribute later - see GH-125413.
…ython#127377) Remove documentation for `pathlib.Path.scandir()`, and rename the method to `_scandir()`. In the private pathlib ABCs, make `iterdir()` abstract and call it from `_scandir()`. It's not worthwhile to add this method at the moment - see discussion: https://discuss.python.org/t/ergonomics-of-new-pathlib-path-scandir/71721 Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Remove the `PathBase.stat()` method. Its use of the `os.stat_result` API, with its 10 mandatory fields and low-level types, makes it an awkward fit for virtual filesystems. We'll look to add a `PathBase.info` attribute later - see pythonGH-125413.
Add `pathlib.Path.scandir()` as a trivial wrapper of `os.scandir()`. This will be used to implement several `PathBase` methods more efficiently, including methods that provide `Path.copy()`.
…ython#126261) Use the new `PathBase.scandir()` method in `PathBase.glob()`, which greatly reduces the number of `PathBase.stat()` calls needed when globbing. There are no user-facing changes, because the pathlib ABCs are still private and `Path.glob()` doesn't use the implementation in its superclass.
…ython#126262) Use the new `PathBase.scandir()` method in `PathBase.walk()`, which greatly reduces the number of `PathBase.stat()` calls needed when walking. There are no user-facing changes, because the pathlib ABCs are still private and `Path.walk()` doesn't use the implementation in its superclass.
Feature or enhancement
I propose we add a new
Path.status
attribute that stores anos.DirEntry
object in paths yielded fromPath.iterdir()
, or a pathlib-specific type with a similar interface in other paths.This would:
os.DirEntry
after callingPath.iterdir()
, which is useful for efficiently determining files' types and often doesn't involve a system call.S_ISREG(st.st_mode)
and other holy incantations.PathBase.stat()
and thestat_result
interface, which is too low-level and local filesystem-specificSee discussion: https://discuss.python.org/t/is-there-a-pathlib-equivalent-of-os-scandir/46626
Linked PRs
pathlib.Path.dir_entry
attribute #125419pathlib.Path.copy()
: get common metadata keys only once #125990pathlib.Path.scandir()
method #126060scandir()
to speed upglob()
#126261scandir()
to speed upwalk()
#126262scandir()
to speed upcopy()
#126263pathlib.Path.scandir()
method #127377pathlib.Path.info
attribute #127730The text was updated successfully, but these errors were encountered: