Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: Rework Windows CI #2841

Closed
wants to merge 2 commits into from
Closed

Conversation

astrojuanlu
Copy link
Member

Description

Windows tests are noticeably slow, but most of the time is spent recreating the conda cache and installing the dependencies (up to 20 minutes!)

image

In addition, lots of system dependencies are installed manually with choco or other systems, while they could be easily installed with conda in one stroke. This includes make, pyspark, and others.

The problem with this approach is that pip install .[test] will ignore the packages that are in the conda environment and reinstall them again. This creates problems not only with pyspark, but also with numpy, since many packages are compiled against it and reinstalling it should never be done. This lack of interoperability between pip and conda is unfortunately a known issue.

The only way I see to proceed is generating an environment.yml that conda understands that sort of duplicates setup.py and contains the dependencies in a way that can be installed with conda, and then leaving only the pip instal . to install the development version of kedro. This would introduce some duplication, but maybe it would help us keep the conda recipe up to date (see conda-forge/kedro-feedstock#42).

For now, to do an experiment I duplicated the table of extra requirements in setup.py, but I don't think it looks great to be honest.

What do folks think?

Development notes

This uses some newer capabilities of conda, such as the libmamba solver which is much faster https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member Author

Closing this in favor of #2843, which doesn't even use conda!

@astrojuanlu astrojuanlu deleted the dev/rework-windows-ci branch July 26, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant