Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed version parsing in AnacondaChannel #944

Merged
merged 29 commits into from
Apr 8, 2024

Conversation

sfc-gh-pczajka
Copy link
Contributor

@sfc-gh-pczajka sfc-gh-pczajka commented Mar 27, 2024

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.

Changes description

  • Change AnacondaChannel to support multiple versions of a package
  • Use packaging.requirements to check whether package version fulfills specs
  • Moved AnacondaChannel tests to test_anaconda.py instead of test_utils
  • Fix package comparison in snowpark.common. _get_snowflake_packages_delta

@sfc-gh-pczajka sfc-gh-pczajka requested a review from a team as a code owner March 27, 2024 16:34
@sfc-gh-pczajka sfc-gh-pczajka changed the title fix package availability parsing in Anaconda Fixed version parsing in AnacondaChannel Mar 27, 2024
Comment on lines 32 to 35
self._packages = {
_standarize_name(package_name): versions
for package_name, versions in packages.items()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are performing for loop here and from_snowflake I think it would be much better to encapsulate this logic in a single place (probably in from_snowflake). Using for loop instead of comprehensions may be better in such case. It will also make tests much cleaner (although you need to adjust them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return False

def package_latest_version(self, package: Requirement) -> str | None:
package_name = _standarize_name(package.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this every time we work with package. Should we consider creating our own custom Requirement that would implement this as property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've added pre-commit hook checking whether requirements library is not used instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you pushed the changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

versions = {parse(v) for v in self._packages[package_name]}
except InvalidVersion:
versions = self._packages[package_name]
return str(max(versions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the max work properly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For parsed versions - yes
For non-pep508 - might work not so good. The Anaconda example was jpeg package, which was available in versions 9c, 9d and 9e.

As it is only used in in snowpark package lookup, I think good alternative might be simply list all possible versions for every package, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +40 to +42
for line in requirements_file.read_text(
file_size_limit_mb=DEFAULT_SIZE_LIMIT_MB
).splitlines():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there readlines method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in pathlib - I'd have to convert that back to "with file.open(...)"

file_size_limit_mb=DEFAULT_SIZE_LIMIT_MB
).splitlines():
# remove comments
line = line.split("#")[0].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't re.sub be a bit cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return [
Requirement.parse(req).to_name_and_version()
for line in f
if (req := line.split("#")[0].strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated code fragment, consider introducing a method for cleaning comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After refactor it is used only in snowflake.cli.plugins.snowpark.package_utils.parse_requirements



@patch("snowflake.cli.plugins.snowpark.package.anaconda.requests")
def test_anaconda_packages_streamlit(mock_requests):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this test?

Copy link
Contributor Author

@sfc-gh-pczajka sfc-gh-pczajka Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯\(ツ)
I got it moved from test_common, it looks like a duplicate of test_anaconda_packages. Removed
My guess is that it was checking out some bugfix

@sfc-gh-pczajka sfc-gh-pczajka force-pushed the SNOW-1250044-fix-anaconda-availability-check branch from 7985773 to 429b2cb Compare April 2, 2024 14:11
@sfc-gh-pczajka sfc-gh-pczajka force-pushed the SNOW-1250044-fix-anaconda-availability-check branch from 5a63679 to 98233fb Compare April 4, 2024 08:20
@sfc-gh-pczajka sfc-gh-pczajka force-pushed the SNOW-1250044-fix-anaconda-availability-check branch from 98233fb to 989e81d Compare April 4, 2024 12:02

download_result = package_utils.download_unavailable_packages(
requirements=requirements,
target_dir=packages_dir,
ignore_anaconda=ignore_anaconda,
anaconda=anaconda,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method requires anaconda or packages available in anaconda? If the latter is true the we can do something like

anaconda_packages = [] if ignore_anaconda else AnacondaChannel.from_snowflake().packages

download_result = package_utils.download_unavailable_packages(anaconda_packages=anaconda_packages, ...)

In this way we have clear interface. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored - I've added AnacondaChannel.empty(), which ignores all packages. This way download_unavailable_packages can forget about ignore_anacoda=true case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean 🚀

download_result.packages_available_in_anaconda,
_write_snowflake_requirements_file(
file_path=snowpark_paths.snowflake_requirements_file,
anaconda=anaconda,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do we really need anaconda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored - I moved this to AnacondaChannel

imports: List[str],
stage_artifact_file: str,
) -> bool:
import logging

log = logging.getLogger(__name__)
resource_json = _convert_resource_details_to_dict(current_state)
anaconda_packages = resource_json["packages"]
deployed_packages = resource_json["packages"]
log.info(
"Found %s defined Anaconda packages in deployed %s...",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Found %s defined Anaconda packages in deployed %s...",
"Found %d defined Anaconda packages in deployed %s...",

Nit which we can fix using this occasion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if updated_package_list:
diff = len(updated_package_list) - len(anaconda_packages)
if _snowflake_requirements_differ(deployed_packages, packages):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the packages? Are those required_packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is contents of requirements.snowflake.txt, which is packages argument for create or replace. It is named packages through the whole deploy logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this more appropriately? I think ambiguous names were one of the main problems of this part of code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to snowflake_dependencies

snowflake: List[Requirement]
other: List[Requirement]
in_snowflake: List[Requirement]
unavailable: List[Requirement]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the most important changes, love it! 🚀

@sfc-gh-pczajka sfc-gh-pczajka merged commit 5066f3c into main Apr 8, 2024
11 checks passed
@sfc-gh-pczajka sfc-gh-pczajka deleted the SNOW-1250044-fix-anaconda-availability-check branch April 8, 2024 13:36
sfc-gh-sichen pushed a commit that referenced this pull request Oct 17, 2024
* fix package availability parsing in Anaconda

* fix non-pep508 version formats

* update release notes

* fix unit tests

* fix dependency format in sql commands

* fix calculating diff between anaconda and requirements

* fix unit tests

* fix unit test

* review fixes part 1

* run pre-commit

* Add pre-commit hook to use our implementation of Requirements

* standarize name while parsing a requirement

* fix unit tests IN PROGRESS

* save anaconda package names in requirements.snowflake.txt

* fix tests

* fix unit tests

* Lookup: return all possible versions if available versions cannot be parsed

* smallfix

* refactor

* refactor anaconda

* refactor

* sort available versions in decreasing order

* refactor ignore_anaconda logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants