-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UC volume support - closes #140 #173
base: main
Are you sure you want to change the base?
Conversation
❌ 41/49 passed, 1 flaky, 8 failed, 2 skipped, 1m48s total ❌ test_open_text_io[VolumePath]: databricks.sdk.errors.platform.NotFound: Catalog '~' does not exist. (1.485s)
❌ test_stat[VolumePath]: databricks.sdk.errors.platform.NotFound: Catalog '~' does not exist. (986ms)
❌ test_mkdirs[VolumePath]: AssertionError: assert not True (772ms)
❌ test_rename_file[VolumePath]: ValueError: Missing catalog, schema or volume name: Volumes/~/qWAQQGWkN3rPj61t (669ms)
❌ test_unlink[VolumePath]: ValueError: Missing catalog, schema or volume name: Volumes/~/h73TWm4OPvPxhFkg (701ms)
❌ test_open_binary_io[VolumePath]: ValueError: Missing catalog, schema or volume name: Volumes/~/1appBK9hrhZYaMjh (855ms)
❌ test_replace_file[VolumePath]: ValueError: Missing catalog, schema or volume name: Volumes/~/X2GR1ybrA2ZTha7J (598ms)
❌ test_resolve_is_consistent[VolumePath]: ValueError: Missing catalog, schema or volume name: /Volumes/a/d (194ms)
Flaky tests:
Running from acceptance #242 |
home directory to the previously created directory (`~/some-folder/foo/bar/baz`), and verifies it matches the expected | ||
relative path (`some-folder/foo/bar/baz`). It then confirms that the expanded path is absolute, checks that | ||
calling `absolute()` on this path returns the path itself, and converts the path to a FUSE-compatible path | ||
This code expands the `~` symbol to the full path of the user's home directory, computes the relative path from this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert irrelevant changes to this file
@@ -42,7 +42,7 @@ def _inner(*_, **__): | |||
return _inner | |||
|
|||
|
|||
class _UploadIO(abc.ABC): | |||
class _WsUploadIO(abc.ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class _WsUploadIO(abc.ABC): | |
class _WorkspaceUploadIO(abc.ABC): |
small nit
self._cached_is_directory = None | ||
self._parse_volume_name() | ||
|
||
def get_catalog_name(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_catalog_name(self) -> str: | |
@property | |
def catalog_name(self) -> str: |
i think os.PathLike
experience implies these as properties. what do you think?
@@ -1041,3 +1281,17 @@ def _select_children(self, path: T) -> Iterable[T]: | |||
if candidate not in yielded: | |||
yielded.add(candidate) | |||
yield candidate | |||
|
|||
|
|||
def create_path(ws: WorkspaceClient, path: str) -> _DatabricksPath: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def create_path(ws: WorkspaceClient, path: str) -> _DatabricksPath: | |
def as_databricks_path(ws: WorkspaceClient, path: str) -> _DatabricksPath: |
if path_without_scheme.startswith("/Volumes/"): | ||
return VolumePath(ws, path_without_scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if path_without_scheme.startswith("/Volumes/"): | |
return VolumePath(ws, path_without_scheme) | |
parts = path_without_scheme.split('/') | |
if parts[0] == 'Volumes': | |
return VolumePath(ws, '/' + '/'.join(parts)) |
and other cases for Workspace
and dbfs
as well. e.g. here you need to convert FUSE path into the relevant implementation. cases for dbfs and wsfs are incorrect here.
because we don't require /Workspace
prefix in the actual instance.
@pytest.mark.parametrize("cls", DATABRICKS_PATHLIKE)
def test_exists(ws, cls):
wsp = cls(ws, "/Users/foo/bar/baz")
assert not wsp.exists()
}
("/Workspace/my/path/file.ext", WorkspacePath, "/Workspace/my/path/file.ext"), | ||
("file:/Workspace/my/path/file.ext", WorkspacePath, "/Workspace/my/path/file.ext"), | ||
("/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | ||
("file:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | ||
("dbfs:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | ||
("/dbfs/my/path/file.ext", DBFSPath, "/dbfs/my/path/file.ext"), | ||
("file:/dbfs/my/path/file.ext", DBFSPath, "/dbfs/my/path/file.ext"), | ||
("dbfs:/my/path/file.ext", DBFSPath, "/my/path/file.ext"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("/Workspace/my/path/file.ext", WorkspacePath, "/Workspace/my/path/file.ext"), | |
("file:/Workspace/my/path/file.ext", WorkspacePath, "/Workspace/my/path/file.ext"), | |
("/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("file:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("dbfs:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("/dbfs/my/path/file.ext", DBFSPath, "/dbfs/my/path/file.ext"), | |
("file:/dbfs/my/path/file.ext", DBFSPath, "/dbfs/my/path/file.ext"), | |
("dbfs:/my/path/file.ext", DBFSPath, "/my/path/file.ext"), | |
("/Workspace/my/path/file.ext", WorkspacePath, "/my/path/file.ext"), | |
("file:/Workspace/my/path/file.ext", WorkspacePath, "/my/path/file.ext"), | |
("/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("file:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("dbfs:/Volumes/my/path/to/file.ext", VolumePath, "/Volumes/my/path/to/file.ext"), | |
("/dbfs/my/path/file.ext", DBFSPath, "/my/path/file.ext"), | |
("file:/dbfs/my/path/file.ext", DBFSPath, "/my/path/file.ext"), | |
("dbfs:/my/path/file.ext", DBFSPath, "/my/path/file.ext"), |
these are the correct expected test cases for as_posix
. what you've specified is as_fuse
assert path.get_catalog_name() == "a" | ||
assert path.get_schema_name(False) == "b" | ||
assert path.get_schema_name(True) == "a.b" | ||
assert path.get_volume_name(True, True) == "a.b.c" | ||
assert path.get_volume_name(True, False) == "a.b.c" | ||
assert path.get_volume_name(False, True) == "b.c" | ||
assert path.get_volume_name(False, False) == "c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert path.get_catalog_name() == "a" | |
assert path.get_schema_name(False) == "b" | |
assert path.get_schema_name(True) == "a.b" | |
assert path.get_volume_name(True, True) == "a.b.c" | |
assert path.get_volume_name(True, False) == "a.b.c" | |
assert path.get_volume_name(False, True) == "b.c" | |
assert path.get_volume_name(False, False) == "c" | |
assert path.catalog_name == "a" | |
assert path.schema_name == "b" | |
assert path.full_schema_name == "a.b" | |
assert path.full_name == "a.b.c" |
get_volume_name(False, False)
is not clear at all.
def test_volume_conversions() -> None: | ||
ws = create_autospec(WorkspaceClient) | ||
ws.config.host = "https://example.org" | ||
path = VolumePath(ws, "/Volumes/a/b/c/d/e.f") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asnare do you this this API is aligned with os.PathLike
?
|
||
def as_fuse(self) -> Path: | ||
"""Return FUSE-mounted path in Databricks Runtime.""" | ||
if "DATABRICKS_RUNTIME_VERSION" not in os.environ: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, apparently, as_fuse equals to as_posix for UC volumes
return f"{self._catalog_name}.{self._schema_name}" | ||
return self._schema_name | ||
|
||
def get_volume_name(self, with_catalog_name: bool = False, with_schema_name: bool = False) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can technically also add
@property
def volume_info(self):
return self._ws.volumes.get(self.full_name)
@property
def schema_info(self): return self._ws.schemas.get(self.schema_name)
Add UC volume support - closes #140