-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update WorkspacePath
to support Python 3.12
#122
Conversation
…ore functionality. Work-in-progress: not yet complete.
This PR breaks backwards compatibility for databrickslabs/ucx downstream. See build logs for more details. Running from downstreams #99 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also look at the downstream failures
|
||
def __bytes__(self): | ||
# Super implementations are fine. | ||
return super(self).__bytes__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't it load the super implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a different bug here: it should be super()
instead of super(self)
.
The superclass will always be loaded, it's really about whether it will invoked and whether this is safe. There are a few categories:
- The implementation is in terms of other public APIs, for all the python versions that we support. These I've marked with the comment you see here. These both act as progress markers to indicate that the implementations have been checked, and when I run the tests with coverage I know I've got a test case that verifies it's fine for all the python versions that we support.
- The implementation is in terms of internal
_protected
properties, which all changed between python 3.11 and 3.12. For these I need an independent implementation, which I'm mainly basing on stripped versions of the default implementation from the (new) abstract base class being introduced in 3.13. - A hybrid of the above. This I treat like the previous item.
My current strategy is to enumerate the public API (including Dunder methods like this), mark each method, and implement the tests to verify it works across the versions.
… worse than the problem.
The code as-is is temporary.
Yup, I'm aware of the cause of those; they'll be fixed. |
WorkspaceClient
to support python 3.12WorkspacePath
to support Python 3.12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's prioritise the behavior covered in https://github.com/databrickslabs/blueprint/blob/main/tests/integration/test_paths.py and used by UCX. no need to implement 100% pathlib API parity in this release - it can be added in the later increments by community.
tests/unit/test_paths.py
Outdated
assert workspace_path.suffix == "" | ||
|
||
|
||
def test_suffix_when_file_is_not_notebook(): | ||
ws = create_autospec(WorkspaceClient) | ||
workspace_path = WorkspacePath(ws, "/test/path") | ||
workspace_path.is_notebook = lambda: False | ||
assert workspace_path.suffix == "" | ||
with patch("databricks.labs.blueprint.paths.WorkspacePath.is_notebook") as mock_is_notebook: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obscure implicit test dependency with mock.patch(XXX). Rewrite to inject dependencies through constructor.. Using patch to mock dependencies in unit tests can introduce implicit dependencies within a class, making it unclear to other developers. Constructor arguments, on the other hand, explicitly declare dependencies, enhancing code readability and maintainability. However, reliance on patch for testing may lead to issues during refactoring, as updates to underlying implementations would necessitate changes across multiple unrelated unit tests. Moreover, the use of hard-coded strings in patch can obscure which unit tests require modification, as they lack strongly typed references. This coupling of the class under test to concrete classes signifies a code smell, and such code is not easily portable to statically typed languages where monkey patching isn't feasible without significant effort. In essence, extensive patching of external clients suggests a need for refactoring, with experienced engineers recognizing the potential for dependency inversion in such scenarios.
To address this issue, refactor the code to inject dependencies through the constructor. This approach explicitly declares dependencies, enhancing code readability and maintainability. Moreover, it allows for dependency inversion, enabling the use of interfaces to decouple the class under test from concrete classes. This decoupling facilitates unit testing, as it allows for the substitution of mock objects for concrete implementations, ensuring that the class under test behaves as expected. By following this approach, you can create more robust and maintainable unit tests, improving the overall quality of your codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored this test so that the behaviour being tested is induced via the (WorkspaceClient
) mock instead of via monkey-patching.
Assuming this is fine, I've also updated the (related) tests above it to work the same way rather than mutating the state of the object under test.
with pytest.raises(FileNotFoundError): | ||
workspace_path.unlink() | ||
|
||
|
||
def test_relative_to(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove existing test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another version of the same test that covers this (and more situations).
I've moved it here so that this is clearer from the PR diff what the change here really is.
This will mean it's clearer in the PR what has changed.
Instead modify behaviour via the constructor-injected mock.
Previously to produce the desired results these tests mutated the internal state of the object under test.
This PR is stacked on top of #122 and updates `WorkspaceClient` to support: - `.glob()` - `.rglob()` - `.iterdir()`
❌ 17/18 passed, 1 failed, 2 skipped, 22s total ❌ test_mkdirs: TypeError: can only concatenate list (not "method") to list (4ms)
Running from acceptance #180 |
This PR relates to #120 and updates the `WorkspacePath` implementation so that it also works under Python 3.12, in addition to Python 3.10 and Python 3.11. Changes include: - Replacing most of the internal implementation to ensure that the superclass implementations aren't used unless we know they're safe. (They rely on implementations that changed dramatically between 3.11 and 3.12 and are incompatible with each other.) - Additional tests to ensure that the public interfaces are tested. Update project metadata to reflect Python 3.12 support. Fix typo. Formatting. Early return. Early return. Replace relative_to/is_relative_to implementation to use our internals. Whitespace. Revert to original .suffix implementation. Move test to site of prior test with the same name. This will mean it's clearer in the PR what has changed. Avoid monkey-patching the class under test. Instead modify behaviour via the constructor-injected mock. Adjust text fixture to induce behaviour via constructor-injected mock. Previously to produce the desired results these tests mutated the internal state of the object under test. Document why __fspath__() isn't supported. Fix as_uri() implementation by ensuring __bytes__() works. Unit test for .as_uri() implementation. Add a partial implementation of .absolute(). This is expected by the integration tests.
Originally part of #122, this PR adds type annotations to the path (unit) tests that didn't already have this, to ensure that Mypy validates the test bodies.
* Added `DBFSPath` as `os.PathLike` implementation ([#131](#131)). The open-source library has been updated with a new class `DBFSPath`, an implementation of `os.PathLike` for Databricks File System (DBFS) paths. This new class extends the existing `WorkspacePath` support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of `DBFSPath` includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the `exists` and `unlink` methods have been updated for `WorkspacePath` to improve performance and raise appropriate errors. * Fixed `.as_uri()` and `.absolute()` implementations for `WorkspacePath` ([#127](#127)). In this release, the `WorkspacePath` class in the `paths.py` module has been updated with several improvements to the `.as_uri()` and `.absolute()` methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The `.as_uri()` method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the `WorkspacePath` object via a new `__bytes__()` dunder method. Additionally, the `.absolute()` method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the `glob()` and `rglob()` methods have been enhanced to support case-sensitive pattern matching based on a new `case_sensitive` parameter. To ensure the integrity of these changes, two new test cases, `test_as_uri()` and `test_absolute()`, have been added, thoroughly testing the functionality of these methods. * Fixed `WorkspacePath` support for python 3.11 ([#121](#121)). The `WorkspacePath` class in our open-source library has been updated to improve compatibility with Python 3.11. The `.expanduser()` and `.glob()` methods have been modified to address internal changes in Python 3.11. The `is_dir()` and `is_file()` methods now include a `follow_symlinks` parameter, although it is not currently used. A new method, `_scandir()`, has been added for compatibility with Python 3.11. The `expanduser()` method has also been updated to expand `~` (but not `~user`) constructs. Additionally, a new method `is_notebook()` has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace. * Properly verify versions of python ([#118](#118)). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the `dependencies` section to expand the library's functionality and compatibility. Additionally, we have removed the `python` field, as it is no longer necessary. We have also updated the `path` field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment. * Type annotations on path-related unit tests ([#128](#128)). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths. * Updated `WorkspacePath` to support Python 3.12 ([#122](#122)). In this release, the `WorkspacePath` implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The `WorkspacePath` class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12. * `WorkspacePath` fixes for the `.resolve()` implementation ([#129](#129)). The `.resolve()` method for `WorkspacePath` has been updated to improve its handling of relative paths and the `strict` argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The `strict` argument is now checked, and if set to `True` and the path does not exist, a `FileNotFoundError` will be raised. The method `.absolute()` is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of `.resolve()`. A new test, `test_resolve()`, has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a `NotImplementedError` is raised, as `.resolve()` is not supported for them. * `WorkspacePath`: Fix the .rename() and .replace() implementations to return the target path ([#130](#130)). The `.rename()` and `.replace()` methods of the `WorkspacePath` class have been updated to return the target path as part of the public API, with `.rename()` no longer accepting the `overwrite` keyword argument and always failing if the target path already exists. A new private method, `._rename()`, has been added to include the `overwrite` argument and is used by both `.rename()` and `.replace()`. This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The `.unlink()` method's behavior remains unchanged. Please note that the exact error raised when `.rename()` fails due to an existing target path is yet to be defined. Dependency updates: * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#133](#133)).
* Added `DBFSPath` as `os.PathLike` implementation ([#131](#131)). The open-source library has been updated with a new class `DBFSPath`, an implementation of `os.PathLike` for Databricks File System (DBFS) paths. This new class extends the existing `WorkspacePath` support and provides pathlib-like functionality for DBFS paths, including methods for creating directories, renaming and deleting files and directories, and reading and writing files. The addition of `DBFSPath` includes type-hinting for improved code linting and is integrated in the test suite with new and updated tests for path-like objects. The behavior of the `exists` and `unlink` methods have been updated for `WorkspacePath` to improve performance and raise appropriate errors. * Fixed `.as_uri()` and `.absolute()` implementations for `WorkspacePath` ([#127](#127)). In this release, the `WorkspacePath` class in the `paths.py` module has been updated with several improvements to the `.as_uri()` and `.absolute()` methods. These methods now utilize PathLib internals, providing better cross-version compatibility. The `.as_uri()` method now uses an f-string for concatenation and returns the UTF-8 encoded string representation of the `WorkspacePath` object via a new `__bytes__()` dunder method. Additionally, the `.absolute()` method has been implemented for the trivial (no-op) case and now supports returning the absolute path of files or directories in Databricks Workspace. Furthermore, the `glob()` and `rglob()` methods have been enhanced to support case-sensitive pattern matching based on a new `case_sensitive` parameter. To ensure the integrity of these changes, two new test cases, `test_as_uri()` and `test_absolute()`, have been added, thoroughly testing the functionality of these methods. * Fixed `WorkspacePath` support for python 3.11 ([#121](#121)). The `WorkspacePath` class in our open-source library has been updated to improve compatibility with Python 3.11. The `.expanduser()` and `.glob()` methods have been modified to address internal changes in Python 3.11. The `is_dir()` and `is_file()` methods now include a `follow_symlinks` parameter, although it is not currently used. A new method, `_scandir()`, has been added for compatibility with Python 3.11. The `expanduser()` method has also been updated to expand `~` (but not `~user`) constructs. Additionally, a new method `is_notebook()` has been introduced to check if the path points to a notebook in Databricks Workspace. These changes aim to ensure that the library functions smoothly with the latest version of Python and provides additional functionality for users working with Databricks Workspace. * Properly verify versions of python ([#118](#118)). In this release, we have made significant updates to the pyproject.toml file to enhance project dependency and development environment management. We have added several new packages to the `dependencies` section to expand the library's functionality and compatibility. Additionally, we have removed the `python` field, as it is no longer necessary. We have also updated the `path` field to specify the location of the virtual environment, which can improve integration with popular development tools such as Visual Studio Code and PyCharm. These changes are intended to streamline the development process and make it easier to manage dependencies and set up the development environment. * Type annotations on path-related unit tests ([#128](#128)). In this open-source library update, type annotations have been added to path-related unit tests to enhance code clarity and maintainability. The tests encompass various scenarios, including verifying if a path exists, creating, removing, and checking directories, and testing file attributes such as distinguishing directories, notebooks, and regular files. The additions also cover functionality for opening and manipulating files in different modes like read binary, write binary, read text, and write text. Furthermore, tests for checking file permissions, handling errors, and globbing (pattern-based file path matching) have been incorporated. The tests interact with a WorkspaceClient mock object, simulating file system interactions. This enhancement bolsters the library's reliability and assists developers in creating robust, well-documented code when working with file system paths. * Updated `WorkspacePath` to support Python 3.12 ([#122](#122)). In this release, the `WorkspacePath` implementation has been updated to ensure compatibility with Python 3.12, in addition to Python 3.10 and 3.11. The class was modified to replace most of the internal implementation and add extensive tests for public interfaces, ensuring that the superclass implementations are not used unless they are known to be safe. This change is in response to the significant changes in the superclass implementations between Python 3.11 and 3.12, which were found to be incompatible with each other. The `WorkspacePath` class now includes several new methods and tests to ensure that it functions seamlessly with different versions of Python. These changes include testing for initialization, equality, hash, comparison, path components, and various path manipulations. This update enhances the library's adaptability and ensures it functions correctly with different versions of Python. Classifiers have also been updated to include support for Python 3.12. * `WorkspacePath` fixes for the `.resolve()` implementation ([#129](#129)). The `.resolve()` method for `WorkspacePath` has been updated to improve its handling of relative paths and the `strict` argument. Previously, relative paths were not properly validated and would be returned as-is. Now, relative paths will cause the method to fail. The `strict` argument is now checked, and if set to `True` and the path does not exist, a `FileNotFoundError` will be raised. The method `.absolute()` is used to obtain the absolute path of the file or directory in Databricks Workspace and is used in the implementation of `.resolve()`. A new test, `test_resolve()`, has been added to verify these changes, covering scenarios where the path is absolute, the path exists, the path does not exist, and the path is relative. In the case of relative paths, a `NotImplementedError` is raised, as `.resolve()` is not supported for them. * `WorkspacePath`: Fix the .rename() and .replace() implementations to return the target path ([#130](#130)). The `.rename()` and `.replace()` methods of the `WorkspacePath` class have been updated to return the target path as part of the public API, with `.rename()` no longer accepting the `overwrite` keyword argument and always failing if the target path already exists. A new private method, `._rename()`, has been added to include the `overwrite` argument and is used by both `.rename()` and `.replace()`. This update is a preparatory step for factoring out common code to support DBFS paths. The tests have been updated accordingly, combining and adding functions to test the new and updated methods. The `.unlink()` method's behavior remains unchanged. Please note that the exact error raised when `.rename()` fails due to an existing target path is yet to be defined. Dependency updates: * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#133](#133)).
This PR relates to #120 and updates the
WorkspacePath
implementation so that it also works under Python 3.12, in addition to Python 3.10 and Python 3.11.Changes include: