Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage copy with directory structure #942

Merged
merged 4 commits into from
Apr 4, 2024
Merged

Conversation

sfc-gh-astus
Copy link
Contributor

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.

Changes description

stage copy from stage will reproduce the directory structure locally.

@sfc-gh-astus sfc-gh-astus requested a review from a team as a code owner March 27, 2024 12:00
@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch 3 times, most recently from b0050f5 to 88f894f Compare March 27, 2024 13:38
RELEASE-NOTES.md Outdated
@@ -21,6 +21,7 @@
* Changing imports in function/procedure section in `snowflake.yml` will cause the definition update on replace
* Adding `--pattern` flag to `stage list` command for filtering out results with regex.
* Fixed snowpark build paths for builds with --project option (fixed empty zip issue).
* `stage copy` from stage will reproduce the directory structure locally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is potentially breaking change? Should we introduce this behind a flag -r/--recursive or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is :( . Added --recursive flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a warning for default use? In this way we will be able to prompt users to use --recursive always and later make it default behavior. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but I wonder if it won't be a breaking change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning is never a breaking change. We should utilize the cli_console.warning so it is automatically suppressed in formatted outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With --format JSON no warning will appear, so maybe it will not be a problem? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added warning

class AggregatedResult(CollectionResult):
"""
Prints results from many cursors as single result. Cursors must have the same number and order of columns.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if this is not true? Should we print a warning / fail in runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is even more complicated, so I decided to use MultipleResult.

@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch 3 times, most recently from 5962b9a to 8d4b42b Compare March 28, 2024 12:03
@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch from 8d4b42b to 14abd3f Compare March 28, 2024 16:37
dest_directory = dest_path / "/".join(
Path(file).parts[stage_parts_length:-1]
)
self._assure_is_existing_directory(Path(dest_directory))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the directory has to exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't get make sure it creates the directories? If I want to use this piece of code in other place, does it mean I have also to create all directories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_assure_is_existing_directory is already existing function which checks if directory exists and creates it if not. Normal get uses the same function.

@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch 2 times, most recently from a7cdde4 to 3cf62cd Compare March 29, 2024 11:05
@@ -68,12 +73,20 @@ def copy(
raise click.ClickException(
"Both source and target path are local. This operation is not supported."
)
if is_put and recursive:
raise click.ClickException("Recursive for PUT is not supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise click.ClickException("Recursive for PUT is not supported.")
raise click.ClickException("Recursive for upload is not supported.")

Or should we ignore the flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave exception, because we will support it soon. I hope 😄 . If we ignore it, command behaviour will change when we finally support it.

@@ -68,12 +73,20 @@ def copy(
raise click.ClickException(
"Both source and target path are local. This operation is not supported."
)
if is_put and recursive:
raise click.ClickException("Recursive for PUT is not supported.")

if is_get:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing order of this if can improve readability, WDYT?

Suggested change
if is_get:
if not is_get:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that improvement 🤔 , for me it will not change anything. I added second if for put, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not is_get:
        # PUT case
        source = Path(source_path).resolve()
        local_path = str(source) + "/*" if source.is_dir() else str(source)

        cursor = StageManager().put(
            local_path=local_path,
            stage_path=destination_path,
            overwrite=overwrite,
            parallel=parallel,
        )
        return QueryResult(cursor)

# GET case
if recursive:
    cursors = StageManager().get_recursive(
        stage_path=source_path, dest_path=target, parallel=parallel
    )
    result = MultipleResults([QueryResult(c) for c in cursors])
else:
    cli_console.warning(
        "Use `--recursive` flag, which copy files recursively with directory structure. This will be the default behavior in the future."
    )
    cursor = StageManager().get(
        stage_path=source_path, dest_path=target, parallel=parallel
    )
    result = QueryResult(cursor)

But if we foresee more changes in put then introducing helper methods may make more sense:

if is_get:
   return _get(...)
return _put(...)

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like helper methods

overwrite=overwrite,
parallel=parallel,
)
return QueryResult(cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad design, if for some reason we do not enter any of the if the function will return None causing unexpected behavior

Comment on lines 84 to 92
return MultipleResults([QueryResult(c) for c in cursors])
else:
cli_console.warning(
"Use `--recursive` flag, which copy files recursively with directory structure. This will be the default behavior in the future."
)
cursor = StageManager().get(
stage_path=source_path, dest_path=target, parallel=parallel
)
return QueryResult(cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the output structure consistent in both cases if used with --format-json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't :( , now it is

@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch 2 times, most recently from 650d5c7 to 0e182d2 Compare April 3, 2024 11:46
@sfc-gh-astus sfc-gh-astus force-pushed the stage-copy-with-directories branch from 0e182d2 to 5e28362 Compare April 4, 2024 07:47

def _get(recursive: bool, source_path: str, destination_path: str, parallel: int):
target = Path(destination_path).resolve()
if recursive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reverse the order if not recursive seems to be a shorter block. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me block are the same :) , changed

@sfc-gh-astus sfc-gh-astus merged commit 61faedd into main Apr 4, 2024
11 checks passed
@sfc-gh-astus sfc-gh-astus deleted the stage-copy-with-directories branch April 4, 2024 10:49
overwrite: bool,
):
if recursive:
raise click.ClickException("Recursive flag for upload is not supported.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@sfc-gh-zblackwood
Copy link
Collaborator

Hooray! 🎉

sfc-gh-sichen pushed a commit that referenced this pull request Oct 17, 2024
Stage copy with directory structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNOW-1006186: snow object stage copy should return files in folders, not flatten out directory structure
5 participants