Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-125413: Add pathlib.Path.info attribute #127730

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6046a27
GH-125413: pathlib ABCs: replace `_scandir()` with `_info`
barneygale Dec 7, 2024
a57c4a8
Merge branch 'main' into gh-125413-info
barneygale Dec 9, 2024
76ef028
Rename `_info` to `_status`
barneygale Dec 9, 2024
5d92785
Add `Status` protocol.
barneygale Dec 9, 2024
dc403c6
Make `Path.status` public.
barneygale Dec 9, 2024
5a128b9
Fix docs typos
barneygale Dec 9, 2024
cac77a6
Docs fixes
barneygale Dec 9, 2024
6e09ada
Fix whatsnew
barneygale Dec 9, 2024
1d8713e
Fix _PathStatus repr, exception handling
barneygale Dec 9, 2024
f8ffbbd
Docs improvements
barneygale Dec 9, 2024
0a86e68
Move PathGlobber into glob.py, now that it uses the public path inter…
barneygale Dec 9, 2024
b0b621d
Simplify _PathStatus implementation a little
barneygale Dec 10, 2024
ef650fd
Add some tests
barneygale Dec 10, 2024
7b990c6
Add news
barneygale Dec 10, 2024
cf1073c
Docs tweaks
barneygale Dec 10, 2024
28bcf00
Merge branch 'main' into gh-125413-info
barneygale Dec 11, 2024
764b8ae
Wrap `os.DirEntry` in `_DirEntryStatus`
barneygale Dec 11, 2024
fa8931b
Add `Status.exists()`
barneygale Dec 11, 2024
2bb6221
Fix test name
barneygale Dec 11, 2024
923542b
Use status.exists() in docs example
barneygale Dec 11, 2024
68377c4
Docs editing
barneygale Dec 12, 2024
4f3f434
Few more test cases
barneygale Dec 12, 2024
2f4da5d
Merge branch 'main' into gh-125413-info
barneygale Dec 12, 2024
a7cffe7
Merge branch 'main' into gh-125413-info
barneygale Dec 12, 2024
dc6edc8
Merge branch 'main' into gh-125413-info
barneygale Dec 12, 2024
530771d
Suppress OSErrors
barneygale Dec 17, 2024
89ff6d4
Add Windows implementation using os.path.isdir() etc
barneygale Dec 17, 2024
592603b
Docstrings
barneygale Dec 17, 2024
bd6332a
Optimise Windows implementation a bit
barneygale Dec 17, 2024
f0ee0e9
More tidying of _PathStatus and friends
barneygale Dec 17, 2024
5ae8b06
`status` --> `info`
barneygale Dec 21, 2024
6e25d2d
`Parser` --> `_PathParser`
barneygale Dec 21, 2024
c93237d
Merge branch 'main' into gh-125413-info
barneygale Dec 22, 2024
2624363
Merge branch 'main' into gh-125413-info
barneygale Dec 29, 2024
662fd2d
Merge branch 'main' into gh-125413-info
barneygale Jan 5, 2025
dbc312c
Merge branch 'main' into gh-125413-info
barneygale Jan 8, 2025
ee7da1d
Update Lib/pathlib/_local.py
barneygale Jan 10, 2025
47ed3de
Merge branch 'main' into gh-125413-info
barneygale Jan 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions Doc/library/pathlib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1177,6 +1177,38 @@ Querying file type and status
.. versionadded:: 3.5


.. attribute:: Path.info

A :class:`~pathlib.types.PathInfo` object that supports querying file type
information. The object exposes methods that cache their results, which can
help reduce the number of system calls needed when switching on file type.
For example::

>>> p = Path('src')
>>> if p.info.is_symlink():
... print('symlink')
... elif p.info.is_dir():
... print('directory')
... elif p.info.exists():
... print('something else')
... else:
... print('not found')
...
directory

If the path was generated from :meth:`Path.iterdir` then this attribute is
initialized with some information about the file type gleaned from scanning
the parent directory. Merely accessing :attr:`Path.info` does not perform
any filesystem queries.

To fetch up-to-date information, it's best to call :meth:`Path.is_dir`,
:meth:`~Path.is_file` and :meth:`~Path.is_symlink` rather than methods of
this attribute. There is no way to reset the cache; instead you can create
a new path object with an empty info cache via ``p = Path(p)``.

.. versionadded:: 3.14


Reading and writing files
^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -1903,3 +1935,56 @@ Below is a table mapping various :mod:`os` functions to their corresponding
.. [4] :func:`os.walk` always follows symlinks when categorizing paths into
*dirnames* and *filenames*, whereas :meth:`Path.walk` categorizes all
symlinks into *filenames* when *follow_symlinks* is false (the default.)


Protocols
---------

.. module:: pathlib.types
:synopsis: pathlib types for static type checking


The :mod:`pathlib.types` module provides types for static type checking.

.. versionadded:: 3.14


.. class:: PathInfo()

A :class:`typing.Protocol` describing the
:attr:`Path.info <pathlib.Path.info>` attribute. Implementations may
return cached results from their methods.

.. method:: exists(*, follow_symlinks=True)

Return ``True`` if the path is an existing file or directory, or any
other kind of file; return ``False`` if the path doesn't exist.

If *follow_symlinks* is ``False``, return ``True`` for symlinks without
checking if their targets exist.

.. method:: is_dir(*, follow_symlinks=True)

Return ``True`` if the path is a directory, or a symbolic link pointing
to a directory; return ``False`` if the path is (or points to) any other
kind of file, or if it doesn't exist.

If *follow_symlinks* is ``False``, return ``True`` only if the path
is a directory (without following symlinks); return ``False`` if the
path is any other kind of file, or if it doesn't exist.

.. method:: is_file(*, follow_symlinks=True)

Return ``True`` if the path is a file, or a symbolic link pointing to
a file; return ``False`` if the path is (or points to) a directory or
other non-file, or if it doesn't exist.

If *follow_symlinks* is ``False``, return ``True`` only if the path
is a file (without following symlinks); return ``False`` if the path
is a directory or other other non-file, or if it doesn't exist.

.. method:: is_symlink()

Return ``True`` if the path is a symbolic link (even if broken); return
``False`` if the path is a directory or any kind of file, or if it
doesn't exist.
9 changes: 9 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,15 @@ pathlib

(Contributed by Barney Gale in :gh:`73991`.)

* Add :attr:`pathlib.Path.info` attribute, which stores an object
implementing the :class:`pathlib.types.PathInfo` protocol (also new). The
object supports querying the file type and internally caching
:func:`~os.stat` results. Path objects generated by
:meth:`~pathlib.Path.iterdir` are initialized with file type information
gleaned from scanning the parent directory.

(Contributed by Barney Gale in :gh:`125413`.)


pdb
---
Expand Down
47 changes: 30 additions & 17 deletions Lib/glob.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ def lexists(path):

@staticmethod
def scandir(path):
"""Implements os.scandir().
"""Like os.scandir(), but generates (entry, name, path) tuples.
"""
raise NotImplementedError

Expand Down Expand Up @@ -425,23 +425,18 @@ def wildcard_selector(self, part, parts):

def select_wildcard(path, exists=False):
try:
# We must close the scandir() object before proceeding to
# avoid exhausting file descriptors when globbing deep trees.
with self.scandir(path) as scandir_it:
entries = list(scandir_it)
entries = self.scandir(path)
except OSError:
pass
else:
prefix = self.add_slash(path)
for entry in entries:
if match is None or match(entry.name):
for entry, entry_name, entry_path in entries:
if match is None or match(entry_name):
if dir_only:
try:
if not entry.is_dir():
continue
except OSError:
continue
entry_path = self.concat_path(prefix, entry.name)
if dir_only:
yield from select_next(entry_path, exists=True)
else:
Expand Down Expand Up @@ -483,15 +478,11 @@ def select_recursive(path, exists=False):
def select_recursive_step(stack, match_pos):
path = stack.pop()
try:
# We must close the scandir() object before proceeding to
# avoid exhausting file descriptors when globbing deep trees.
with self.scandir(path) as scandir_it:
entries = list(scandir_it)
entries = self.scandir(path)
except OSError:
pass
else:
prefix = self.add_slash(path)
for entry in entries:
for entry, _entry_name, entry_path in entries:
is_dir = False
try:
if entry.is_dir(follow_symlinks=follow_symlinks):
Expand All @@ -500,7 +491,6 @@ def select_recursive_step(stack, match_pos):
pass

if is_dir or not dir_only:
entry_path = self.concat_path(prefix, entry.name)
if match is None or match(str(entry_path), match_pos):
if dir_only:
yield from select_next(entry_path, exists=True)
Expand Down Expand Up @@ -528,9 +518,16 @@ class _StringGlobber(_GlobberBase):
"""Provides shell-style pattern matching and globbing for string paths.
"""
lexists = staticmethod(os.path.lexists)
scandir = staticmethod(os.scandir)
concat_path = operator.add

@staticmethod
def scandir(path):
# We must close the scandir() object before proceeding to
# avoid exhausting file descriptors when globbing deep trees.
with os.scandir(path) as scandir_it:
entries = list(scandir_it)
return ((entry, entry.name, entry.path) for entry in entries)

if os.name == 'nt':
@staticmethod
def add_slash(pathname):
Expand All @@ -544,3 +541,19 @@ def add_slash(pathname):
if not pathname or pathname[-1] == '/':
return pathname
return f'{pathname}/'


class _PathGlobber(_GlobberBase):
"""Provides shell-style pattern matching and globbing for pathlib paths.
"""

lexists = operator.methodcaller('exists', follow_symlinks=False)
add_slash = operator.methodcaller('joinpath', '')

@staticmethod
def scandir(path):
return ((child.info, child.name, child) for child in path.iterdir())

@staticmethod
def concat_path(path, text):
return path.with_segments(str(path) + text)
75 changes: 30 additions & 45 deletions Lib/pathlib/_abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,9 @@
"""

import functools
import operator
import posixpath
from errno import EINVAL
from glob import _GlobberBase, _no_recurse_symlinks
from glob import _PathGlobber, _no_recurse_symlinks
from pathlib._os import copyfileobj


Expand All @@ -41,21 +40,6 @@ def _explode_path(path):
return path, names


class PathGlobber(_GlobberBase):
"""
Class providing shell-style globbing for path objects.
"""

lexists = operator.methodcaller('exists', follow_symlinks=False)
add_slash = operator.methodcaller('joinpath', '')
scandir = operator.methodcaller('_scandir')

@staticmethod
def concat_path(path, text):
"""Appends text to the given path."""
return path.with_segments(str(path) + text)


class CopyReader:
"""
Class that implements copying between path objects. An instance of this
Expand Down Expand Up @@ -355,7 +339,7 @@ def match(self, path_pattern, *, case_sensitive=None):
return False
if len(path_parts) > len(pattern_parts) and path_pattern.anchor:
return False
globber = PathGlobber(sep, case_sensitive)
globber = _PathGlobber(sep, case_sensitive)
for path_part, pattern_part in zip(path_parts, pattern_parts):
match = globber.compile(pattern_part)
if match(path_part) is None:
Expand All @@ -371,7 +355,7 @@ def full_match(self, pattern, *, case_sensitive=None):
pattern = self.with_segments(pattern)
if case_sensitive is None:
case_sensitive = _is_case_sensitive(self.parser)
globber = PathGlobber(pattern.parser.sep, case_sensitive, recursive=True)
globber = _PathGlobber(pattern.parser.sep, case_sensitive, recursive=True)
match = globber.compile(str(pattern))
return match(str(self)) is not None

Expand All @@ -392,33 +376,45 @@ class ReadablePath(JoinablePath):
"""
__slots__ = ()

@property
def info(self):
"""
A PathInfo object that exposes the file type and other file attributes
of this path.
"""
raise NotImplementedError

def exists(self, *, follow_symlinks=True):
"""
Whether this path exists.

This method normally follows symlinks; to check whether a symlink exists,
add the argument follow_symlinks=False.
"""
raise NotImplementedError
info = self.joinpath().info
return info.exists(follow_symlinks=follow_symlinks)

def is_dir(self, *, follow_symlinks=True):
"""
Whether this path is a directory.
"""
raise NotImplementedError
info = self.joinpath().info
return info.is_dir(follow_symlinks=follow_symlinks)

def is_file(self, *, follow_symlinks=True):
"""
Whether this path is a regular file (also True for symlinks pointing
to regular files).
"""
raise NotImplementedError
info = self.joinpath().info
return info.is_file(follow_symlinks=follow_symlinks)

def is_symlink(self):
"""
Whether this path is a symbolic link.
"""
raise NotImplementedError
info = self.joinpath().info
return info.is_symlink()

def open(self, mode='r', buffering=-1, encoding=None,
errors=None, newline=None):
Expand All @@ -442,15 +438,6 @@ def read_text(self, encoding=None, errors=None, newline=None):
with self.open(mode='r', encoding=encoding, errors=errors, newline=newline) as f:
return f.read()

def _scandir(self):
"""Yield os.DirEntry-like objects of the directory contents.

The children are yielded in arbitrary order, and the
special entries '.' and '..' are not included.
"""
import contextlib
return contextlib.nullcontext(self.iterdir())

def iterdir(self):
"""Yield path objects of the directory contents.

Expand All @@ -476,7 +463,7 @@ def glob(self, pattern, *, case_sensitive=None, recurse_symlinks=True):
else:
case_pedantic = True
recursive = True if recurse_symlinks else _no_recurse_symlinks
globber = PathGlobber(self.parser.sep, case_sensitive, case_pedantic, recursive)
globber = _PathGlobber(self.parser.sep, case_sensitive, case_pedantic, recursive)
select = globber.selector(parts)
return select(self)

Expand All @@ -503,18 +490,16 @@ def walk(self, top_down=True, on_error=None, follow_symlinks=False):
if not top_down:
paths.append((path, dirnames, filenames))
try:
with path._scandir() as entries:
for entry in entries:
name = entry.name
try:
if entry.is_dir(follow_symlinks=follow_symlinks):
if not top_down:
paths.append(path.joinpath(name))
dirnames.append(name)
else:
filenames.append(name)
except OSError:
filenames.append(name)
for child in path.iterdir():
try:
if child.info.is_dir(follow_symlinks=follow_symlinks):
if not top_down:
paths.append(child)
dirnames.append(child.name)
else:
filenames.append(child.name)
except OSError:
filenames.append(child.name)
except OSError as error:
if on_error is not None:
on_error(error)
Expand Down
Loading
Loading