Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make plugin=kedro-datasets install-test-requirements fails #597

Closed
grofte opened this issue Mar 6, 2024 · 16 comments
Closed

make plugin=kedro-datasets install-test-requirements fails #597

grofte opened this issue Mar 6, 2024 · 16 comments
Assignees
Labels
bug Something isn't working datasets

Comments

@grofte
Copy link

grofte commented Mar 6, 2024

Description

Running make plugin=kedro-datasets install-test-requirements does not work. Dependency conflicts.

Edit by @astrojuanlu: Summary and possible next steps at #597 (comment)

Context

I wanted to contribute a PR for some Polars support but I can't install the dependencies.

Steps to Reproduce

  1. Fork + clone repo
  2. Create anaconda environment with Python 3.9 conda create -n PR-kedro python=3.9 (contribution readme says 3.6+ but PyPi says 3.9+)
  3. conda activate PR-kedro
  4. Run make plugin=kedro-datasets install-test-requirements

Expected Result

Pip should install the required libraries.

Actual Result

Pip did not.

INFO: pip is looking at multiple versions of dask[complete] to determine which version is compatible with other requirements. This could take a while.
INFO: pip is still looking at multiple versions of dask[complete] to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
ERROR: Cannot install dask[complete]==2024.2.1 and kedro-datasets[test]==2.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro-datasets[test] 2.1.0 depends on dask>=2021.10; extra == "test"
    dask[complete] 2024.2.1 depends on dask 2024.2.1 (from https://files.pythonhosted.org/packages/ff/d3/f1dcba697c7d7e8470ffa34b31ca1e663d4a2654ef806877f1017ecc5102/dask-2024.2.1-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.9))

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Let me know if you want the full message from Pip but I think this covers all the relevant information.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): current
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): current
  • Python version used (python -V): 3.9.18
  • Operating system and version: Ubuntu 20.04
@astrojuanlu
Copy link
Member

Thanks for the report @grofte , we'll look into this.

@astrojuanlu
Copy link
Member

@grofte What pip version is this?

@astrojuanlu astrojuanlu added bug Something isn't working datasets labels Mar 6, 2024
@astrojuanlu
Copy link
Member

contribution readme says 3.6+ but PyPi says 3.9+

Time to update the contribution readme too 👍🏽

@grofte
Copy link
Author

grofte commented Mar 6, 2024

pip --version
pip 23.3.1 from /home/mog/anaconda3/envs/PR-kedro/lib/python3.9/site-packages/pip (python 3.9)

You're right tho, pip version does matter. I would suggest that you change pip in the makefile to python -m pip. Unless there's some problem with that that I am unaware of. It doesn't work with python -m pip either tho (and it's the same pip version).

@noklam
Copy link
Contributor

noklam commented Mar 6, 2024

Good shout about the readme, on the other hand we need more information.

https://github.com/kedro-org/kedro/actions/runs/8158985607/job/22302183993
I am checking some CI that we run which use the make install command and it runs successfully for py39.

Most likely pip version problem.

@grofte
Copy link
Author

grofte commented Mar 6, 2024

I don't even understand how Dask depending on Dask and kedro-datasets depending on Dask gives a conflict.

Anyway, I installed dask[complete] on it's own first, commented it out from the pyproject.toml and ran that. It took ages and I ran out of harddrive space. So I cleaned up my harddisk. Then I removed all the non-test optional dependencies in pyproject.toml and ran it with uv pip install -r pyproject.toml --all-extras (and commented out pandas-gbq since google somehow broke uv). That was really fast but uv is all or nothing when it comes to optional dependencies apparently.

Running the rest of the CI worked and make test-no-spark gave me 944 passed, 9 skipped, 21 xfailed, 2 xpassed, 53 errors in 117.09s (0:01:57) which I think is fair. XFAILs are from TestVideoDataset and most of the errors seem to be from AWS botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to..

EDIT: omg you guys, uv was sooooo much faster

@astrojuanlu
Copy link
Member

EDIT: omg you guys, uv was sooooo much faster

Yep 😄 we're using it in our CI already.

About the google dependency, we requested that they yank it, since it's invalid, but they didn't seem to fully understand and they closed the issue already googleapis/python-bigquery#1818

Anyway, I installed dask[complete] on it's own first, commented it out from the pyproject.toml and ran that. It took ages and I ran out of harddrive space. So I cleaned up my harddisk.

Ughhhhh. I'm sorry, hope it was not too painful.

@grofte
Copy link
Author

grofte commented Mar 7, 2024

I did manage to do a draft PR and I probably fucked everything up =D
#598

@astrojuanlu
Copy link
Member

I confirm this is still the case as of today.

ERROR: Cannot install dask[complete]==2024.8.0 and kedro-datasets[test]==4.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro-datasets[test] 4.1.0 depends on dask>=2021.10; extra == "test"
    dask[complete] 2024.8.0 depends on dask 2024.8.0 (from https://files.pythonhosted.org/packages/db/47/136a5dd68a33089f96f8aa1178ccd545d325ec9ab2bb42a3038711a935c0/dask-2024.8.0-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.9))

...and yet, uv pip install .[test] was successful.

# echo ".[test]" > requirements.in
# uv pip compile requirements.in -o requirements.txt
...
# This file was autogenerated by uv via the following command:
#    uv pip compile requirements.in -o requirements.txt
absl-py==2.1.0
    # via
    #   keras
    #   tensorboard
    #   tensorflow
accelerate==0.31.0
    # via
    #   kedro-datasets
    #   transformers
adlfs==2023.8.0
    # via kedro-datasets
aiobotocore==2.4.2
    # via s3fs
aiohappyeyeballs==2.4.0
    # via aiohttp
aiohttp==3.10.5
    # via
    #   adlfs
    #   aiobotocore
    #   datasets
    #   fsspec
    #   gcsfs
    #   s3fs
aioitertools==0.11.0
    # via aiobotocore
aiosignal==1.3.1
    # via aiohttp
antlr4-python3-runtime==4.9.3
    # via omegaconf
anyio==4.4.0
    # via
    #   httpx
    #   jupyter-server
appdirs==1.4.4
    # via
    #   fs
    #   kedro-telemetry
    #   pins
argon2-cffi==23.1.0
    # via jupyter-server
argon2-cffi-bindings==21.2.0
    # via argon2-cffi
arrow==1.3.0
    # via
    #   cookiecutter
    #   isoduration
asn1crypto==1.5.1
    # via snowflake-connector-python
astunparse==1.6.3
    # via tensorflow
async-lru==2.0.4
    # via jupyterlab
async-timeout==4.0.3
    # via
    #   aiohttp
    #   redis
atpublic==4.1.0
    # via ibis-framework
attrs==24.2.0
    # via
    #   aiohttp
    #   fiona
    #   jsonschema
    #   kedro
    #   referencing
azure-core==1.30.2
    # via
    #   adlfs
    #   azure-identity
    #   azure-storage-blob
azure-datalake-store==0.0.53
    # via adlfs
azure-identity==1.17.1
    # via adlfs
azure-storage-blob==12.22.0
    # via adlfs
babel==2.16.0
    # via jupyterlab-server
backcall==0.2.0
    # via ipython
bandit==1.7.9
    # via kedro-datasets
beautifulsoup4==4.12.3
    # via nbconvert
behave==1.2.6
    # via kedro-datasets
bidict==0.23.1
    # via ibis-framework
binaryornot==0.4.4
    # via cookiecutter
biopython==1.84
    # via kedro-datasets
black==22.12.0
    # via
    #   blacken-docs
    #   kedro-datasets
blacken-docs==1.9.2
    # via kedro-datasets
bleach==6.1.0
    # via
    #   nbconvert
    #   panel
blosc2==2.5.1
    # via tables
bokeh==3.4.3
    # via
    #   dask
    #   holoviews
    #   panel
boto3==1.24.59
    # via moto
botocore==1.27.59
    # via
    #   aiobotocore
    #   boto3
    #   moto
    #   s3transfer
build==1.2.1
    # via kedro
cachetools==5.5.0
    # via
    #   google-auth
    #   kedro
certifi==2024.7.4
    # via
    #   fiona
    #   httpcore
    #   httpx
    #   pyproj
    #   requests
    #   snowflake-connector-python
cffi==1.17.0
    # via
    #   argon2-cffi-bindings
    #   azure-datalake-store
    #   cryptography
    #   snowflake-connector-python
cfgv==3.4.0
    # via pre-commit
chardet==5.2.0
    # via binaryornot
charset-normalizer==3.3.2
    # via
    #   requests
    #   snowflake-connector-python
click==8.1.7
    # via
    #   black
    #   click-plugins
    #   cligj
    #   cookiecutter
    #   dask
    #   distributed
    #   fiona
    #   import-linter
    #   kedro
click-plugins==1.1.1
    # via fiona
cligj==0.7.2
    # via fiona
cloudpickle==2.0.0
    # via
    #   dask
    #   distributed
    #   kedro-datasets
    #   snowflake-snowpark-python
colorcet==3.1.0
    # via holoviews
comm==0.2.2
    # via
    #   ipykernel
    #   ipywidgets
compress-pickle==2.1.0
    # via kedro-datasets
contourpy==1.2.1
    # via bokeh
cookiecutter==2.6.0
    # via kedro
coverage==7.6.1
    # via
    #   kedro-datasets
    #   pytest-cov
cryptography==43.0.0
    # via
    #   azure-identity
    #   azure-storage-blob
    #   moto
    #   msal
    #   pyjwt
    #   pyopenssl
    #   snowflake-connector-python
    #   types-pyopenssl
    #   types-redis
cycler==0.12.1
    # via matplotlib
dask==2024.8.0
    # via
    #   dask-expr
    #   distributed
    #   kedro-datasets
dask-expr==1.1.10
    # via dask
datasets==2.2.1
    # via kedro-datasets
db-dtypes==1.3.0
    # via pandas-gbq
debugpy==1.8.5
    # via ipykernel
decorator==5.1.1
    # via
    #   gcsfs
    #   ipython
defusedxml==0.7.1
    # via nbconvert
delta-spark==2.4.0
    # via kedro-datasets
deltalake==0.19.1
    # via
    #   kedro-datasets
    #   polars
dill==0.3.8
    # via
    #   datasets
    #   kedro-datasets
    #   multiprocess
distlib==0.3.8
    # via virtualenv
distributed==2024.8.0
    # via dask
docopt==0.6.2
    # via hdfs
duckdb==0.10.3
    # via ibis-framework
dynaconf==3.2.6
    # via kedro
et-xmlfile==1.1.0
    # via openpyxl
exceptiongroup==1.2.2
    # via
    #   anyio
    #   pytest
execnet==2.1.1
    # via pytest-xdist
fastjsonschema==2.20.0
    # via nbformat
filelock==3.15.4
    # via
    #   huggingface-hub
    #   kedro-datasets
    #   snowflake-connector-python
    #   torch
    #   transformers
    #   triton
    #   virtualenv
fiona==1.9.6
    # via geopandas
flatbuffers==24.3.25
    # via tensorflow
fqdn==1.5.1
    # via jsonschema
frozenlist==1.4.1
    # via
    #   aiohttp
    #   aiosignal
fs==2.4.16
    # via triad
fsspec==2023.1.0
    # via
    #   adlfs
    #   dask
    #   datasets
    #   gcsfs
    #   huggingface-hub
    #   ibis-framework
    #   kedro
    #   pins
    #   s3fs
    #   torch
    #   triad
gast==0.6.0
    # via tensorflow
gcsfs==2023.1.0
    # via
    #   kedro-datasets
    #   pins
geopandas==0.14.4
    # via kedro-datasets
gitdb==4.0.11
    # via gitdb2
gitdb2==4.0.2
    # via gitpython
gitpython==3.0.6
    # via
    #   kedro
    #   trufflehog
google-api-core==2.19.1
    # via
    #   google-cloud-bigquery
    #   google-cloud-core
    #   google-cloud-storage
    #   pandas-gbq
google-auth==2.34.0
    # via
    #   gcsfs
    #   google-api-core
    #   google-auth-oauthlib
    #   google-cloud-bigquery
    #   google-cloud-core
    #   google-cloud-storage
    #   pandas-gbq
    #   pydata-google-auth
google-auth-oauthlib==1.2.1
    # via
    #   gcsfs
    #   pandas-gbq
    #   pydata-google-auth
google-cloud-bigquery==3.25.0
    # via pandas-gbq
google-cloud-core==2.4.1
    # via
    #   google-cloud-bigquery
    #   google-cloud-storage
google-cloud-storage==2.18.2
    # via gcsfs
google-crc32c==1.5.0
    # via
    #   google-cloud-storage
    #   google-resumable-media
google-pasta==0.2.0
    # via tensorflow
google-resumable-media==2.7.2
    # via
    #   google-cloud-bigquery
    #   google-cloud-storage
googleapis-common-protos==1.63.2
    # via
    #   google-api-core
    #   grpcio-status
greenlet==3.0.3
    # via sqlalchemy
grimp==1.3
    # via import-linter
grpcio==1.65.5
    # via
    #   google-api-core
    #   grpcio-status
    #   tensorboard
    #   tensorflow
grpcio-status==1.62.3
    # via google-api-core
h11==0.14.0
    # via httpcore
h5py==3.11.0
    # via
    #   keras
    #   tensorflow
hdfs==2.7.3
    # via kedro-datasets
holoviews==1.19.1
    # via kedro-datasets
httpcore==1.0.5
    # via httpx
httpx==0.27.0
    # via jupyterlab
huggingface-hub==0.17.3
    # via
    #   accelerate
    #   datasets
    #   kedro-datasets
    #   tokenizers
    #   transformers
humanize==4.10.0
    # via pins
ibis-framework==9.0.0
    # via kedro-datasets
identify==2.6.0
    # via pre-commit
idna==3.7
    # via
    #   anyio
    #   httpx
    #   jsonschema
    #   requests
    #   snowflake-connector-python
    #   yarl
import-linter==1.2.6
    # via kedro-datasets
importlib-metadata==8.4.0
    # via
    #   build
    #   dask
    #   delta-spark
    #   fiona
    #   jupyter-client
    #   jupyter-lsp
    #   jupyterlab
    #   jupyterlab-server
    #   kedro
    #   markdown
    #   nbconvert
    #   pins
importlib-resources==6.4.3
    # via
    #   kedro
    #   pins
iniconfig==2.0.0
    # via pytest
ipykernel==6.29.5
    # via
    #   jupyter
    #   jupyter-console
    #   jupyterlab
    #   qtconsole
ipython==7.34.0
    # via
    #   ipykernel
    #   ipywidgets
    #   jupyter-console
    #   kedro-datasets
ipywidgets==8.1.3
    # via jupyter
isodate==0.6.1
    # via azure-storage-blob
isoduration==20.11.0
    # via jsonschema
jedi==0.19.1
    # via ipython
jinja2==3.0.3
    # via
    #   bokeh
    #   cookiecutter
    #   dask
    #   distributed
    #   jupyter-server
    #   jupyterlab
    #   jupyterlab-server
    #   kedro-datasets
    #   moto
    #   nbconvert
    #   pins
    #   torch
jmespath==1.0.1
    # via
    #   boto3
    #   botocore
joblib==1.4.2
    # via
    #   kedro-datasets
    #   pins
    #   scikit-learn
json5==0.9.25
    # via jupyterlab-server
jsonpointer==3.0.0
    # via jsonschema
jsonschema==4.23.0
    # via
    #   jupyter-events
    #   jupyterlab-server
    #   nbformat
jsonschema-specifications==2023.12.1
    # via jsonschema
jupyter==1.0.0
    # via kedro-datasets
jupyter-client==8.6.2
    # via
    #   ipykernel
    #   jupyter-console
    #   jupyter-server
    #   nbclient
    #   qtconsole
jupyter-console==6.6.3
    # via jupyter
jupyter-core==5.7.2
    # via
    #   ipykernel
    #   jupyter-client
    #   jupyter-console
    #   jupyter-server
    #   jupyterlab
    #   nbclient
    #   nbconvert
    #   nbformat
    #   qtconsole
jupyter-events==0.10.0
    # via jupyter-server
jupyter-lsp==2.2.5
    # via jupyterlab
jupyter-server==2.14.2
    # via
    #   jupyter-lsp
    #   jupyterlab
    #   jupyterlab-server
    #   notebook
    #   notebook-shim
jupyter-server-terminals==0.5.3
    # via jupyter-server
jupyterlab==4.2.4
    # via
    #   kedro-datasets
    #   notebook
jupyterlab-pygments==0.3.0
    # via nbconvert
jupyterlab-server==2.27.3
    # via
    #   jupyterlab
    #   notebook
jupyterlab-widgets==3.0.11
    # via ipywidgets
kedro==0.19.7
    # via
    #   kedro-datasets
    #   kedro-telemetry
.
    # via -r requirements.in
kedro-telemetry==0.6.0
    # via kedro
keras==3.5.0
    # via tensorflow
kiwisolver==1.4.5
    # via matplotlib
lazy-loader==0.4
    # via kedro-datasets
libclang==18.1.1
    # via tensorflow
linkify-it-py==2.0.3
    # via panel
locket==1.0.0
    # via
    #   distributed
    #   partd
lxml==4.9.4
    # via kedro-datasets
lz4==4.3.3
    # via
    #   compress-pickle
    #   dask
markdown==3.7
    # via
    #   panel
    #   tensorboard
markdown-it-py==3.0.0
    # via
    #   mdit-py-plugins
    #   panel
    #   rich
markupsafe==2.1.5
    # via
    #   jinja2
    #   nbconvert
    #   werkzeug
matplotlib==3.3.4
    # via kedro-datasets
matplotlib-inline==0.1.7
    # via
    #   ipykernel
    #   ipython
mdit-py-plugins==0.4.1
    # via panel
mdurl==0.1.2
    # via markdown-it-py
memory-profiler==0.61.0
    # via kedro-datasets
mistune==3.0.2
    # via nbconvert
ml-dtypes==0.4.0
    # via
    #   keras
    #   tensorflow
more-itertools==10.4.0
    # via kedro
moto==5.0.0
    # via kedro-datasets
mpmath==1.3.0
    # via sympy
msal==1.30.0
    # via
    #   azure-datalake-store
    #   azure-identity
    #   msal-extensions
msal-extensions==1.2.0
    # via azure-identity
msgpack==1.0.8
    # via
    #   blosc2
    #   distributed
multidict==6.0.5
    # via
    #   aiohttp
    #   yarl
multiprocess==0.70.16
    # via datasets
mypy==1.11.1
    # via kedro-datasets
mypy-extensions==1.0.0
    # via
    #   black
    #   mypy
namex==0.0.8
    # via keras
nbclient==0.10.0
    # via nbconvert
nbconvert==7.16.4
    # via
    #   jupyter
    #   jupyter-server
nbformat==5.10.4
    # via
    #   jupyter-server
    #   nbclient
    #   nbconvert
ndindex==1.8
    # via blosc2
nest-asyncio==1.6.0
    # via ipykernel
networkx==2.8.8
    # via
    #   grimp
    #   kedro-datasets
    #   torch
nodeenv==1.9.1
    # via pre-commit
notebook==7.2.1
    # via jupyter
notebook-shim==0.2.4
    # via
    #   jupyterlab
    #   notebook
numexpr==2.10.1
    # via tables
numpy==1.26.4
    # via
    #   accelerate
    #   biopython
    #   blosc2
    #   bokeh
    #   contourpy
    #   dask
    #   datasets
    #   db-dtypes
    #   geopandas
    #   h5py
    #   holoviews
    #   ibis-framework
    #   keras
    #   matplotlib
    #   ml-dtypes
    #   numexpr
    #   opencv-python
    #   opt-einsum
    #   pandas
    #   pandas-gbq
    #   pyarrow
    #   scikit-learn
    #   scipy
    #   shapely
    #   tables
    #   tensorboard
    #   tensorflow
    #   transformers
    #   triad
    #   xarray
nvidia-cublas-cu12==12.1.3.1
    # via
    #   nvidia-cudnn-cu12
    #   nvidia-cusolver-cu12
    #   torch
nvidia-cuda-cupti-cu12==12.1.105
    # via torch
nvidia-cuda-nvrtc-cu12==12.1.105
    # via torch
nvidia-cuda-runtime-cu12==12.1.105
    # via torch
nvidia-cudnn-cu12==9.1.0.70
    # via torch
nvidia-cufft-cu12==11.0.2.54
    # via torch
nvidia-curand-cu12==10.3.2.106
    # via torch
nvidia-cusolver-cu12==11.4.5.107
    # via torch
nvidia-cusparse-cu12==12.1.0.106
    # via
    #   nvidia-cusolver-cu12
    #   torch
nvidia-nccl-cu12==2.20.5
    # via torch
nvidia-nvjitlink-cu12==12.6.20
    # via
    #   nvidia-cusolver-cu12
    #   nvidia-cusparse-cu12
nvidia-nvtx-cu12==12.1.105
    # via torch
oauthlib==3.2.2
    # via requests-oauthlib
omegaconf==2.3.0
    # via kedro
opencv-python==4.5.5.64
    # via kedro-datasets
openpyxl==3.1.5
    # via kedro-datasets
opt-einsum==3.3.0
    # via tensorflow
optree==0.12.1
    # via keras
overrides==7.7.0
    # via jupyter-server
packaging==24.1
    # via
    #   accelerate
    #   bokeh
    #   build
    #   dask
    #   datasets
    #   db-dtypes
    #   distributed
    #   geopandas
    #   google-cloud-bigquery
    #   holoviews
    #   huggingface-hub
    #   ipykernel
    #   jupyter-server
    #   jupyterlab
    #   jupyterlab-server
    #   kedro-datasets
    #   keras
    #   lazy-loader
    #   nbconvert
    #   pandas-gbq
    #   plotly
    #   pytest
    #   pytoolconfig
    #   qtconsole
    #   qtpy
    #   snowflake-connector-python
    #   tables
    #   tensorboard
    #   tensorflow
    #   transformers
    #   xarray
pandas==2.2.2
    # via
    #   bokeh
    #   dask
    #   dask-expr
    #   datasets
    #   db-dtypes
    #   geopandas
    #   holoviews
    #   ibis-framework
    #   kedro-datasets
    #   pandas-gbq
    #   panel
    #   pins
    #   triad
    #   xarray
pandas-gbq==0.23.1
    # via kedro-datasets
pandocfilters==1.5.1
    # via nbconvert
panel==1.4.5
    # via holoviews
param==2.1.1
    # via
    #   holoviews
    #   panel
    #   pyviz-comms
parse==1.20.2
    # via
    #   behave
    #   kedro
    #   parse-type
parse-type==0.6.2
    # via behave
parso==0.8.4
    # via jedi
parsy==2.1
    # via ibis-framework
partd==1.4.2
    # via dask
pathspec==0.12.1
    # via black
pbr==6.0.0
    # via stevedore
pexpect==4.9.0
    # via ipython
pickleshare==0.7.5
    # via ipython
pillow==9.5.0
    # via
    #   bokeh
    #   kedro-datasets
    #   matplotlib
pins==0.8.6
    # via ibis-framework
platformdirs==4.2.2
    # via
    #   black
    #   jupyter-core
    #   pytoolconfig
    #   snowflake-connector-python
    #   virtualenv
plotly==5.23.0
    # via kedro-datasets
pluggy==1.5.0
    # via
    #   kedro
    #   pytest
polars==0.18.15
    # via kedro-datasets
portalocker==2.10.1
    # via msal-extensions
pre-commit==3.8.0
    # via kedro-datasets
pre-commit-hooks==4.6.0
    # via kedro
prometheus-client==0.20.0
    # via jupyter-server
prompt-toolkit==3.0.47
    # via
    #   ipython
    #   jupyter-console
proto-plus==1.24.0
    # via google-api-core
protobuf==4.25.4
    # via
    #   google-api-core
    #   googleapis-common-protos
    #   grpcio-status
    #   proto-plus
    #   tensorboard
    #   tensorflow
psutil==6.0.0
    # via
    #   accelerate
    #   distributed
    #   ipykernel
    #   memory-profiler
    #   pytest-xdist
ptyprocess==0.7.0
    # via
    #   pexpect
    #   terminado
py==1.11.0
    # via pytest-forked
py-cpuinfo==9.0.0
    # via
    #   blosc2
    #   tables
py4j==0.10.9.7
    # via pyspark
pyarrow==16.1.0
    # via
    #   dask
    #   dask-expr
    #   datasets
    #   db-dtypes
    #   deltalake
    #   ibis-framework
    #   kedro-datasets
    #   pandas-gbq
    #   triad
pyarrow-hotfix==0.6
    # via
    #   dask
    #   ibis-framework
pyasn1==0.6.0
    # via
    #   pyasn1-modules
    #   rsa
pyasn1-modules==0.4.0
    # via google-auth
pycparser==2.22
    # via cffi
pydata-google-auth==1.8.2
    # via pandas-gbq
pygments==2.18.0
    # via
    #   ipython
    #   jupyter-console
    #   nbconvert
    #   qtconsole
    #   rich
pyjwt==2.9.0
    # via
    #   msal
    #   snowflake-connector-python
pyodbc==5.1.0
    # via kedro-datasets
pyopenssl==24.2.1
    # via snowflake-connector-python
pyparsing==3.1.2
    # via matplotlib
pyproj==3.6.1
    # via
    #   geopandas
    #   kedro-datasets
pyproject-hooks==1.1.0
    # via build
pyspark==3.4.3
    # via
    #   delta-spark
    #   kedro-datasets
pytest==7.4.4
    # via
    #   kedro-datasets
    #   pytest-cov
    #   pytest-forked
    #   pytest-mock
    #   pytest-xdist
pytest-cov==3.0.0
    # via kedro-datasets
pytest-forked==1.6.0
    # via pytest-xdist
pytest-mock==1.13.0
    # via kedro-datasets
pytest-xdist==2.2.1
    # via kedro-datasets
python-dateutil==2.9.0.post0
    # via
    #   arrow
    #   botocore
    #   google-cloud-bigquery
    #   ibis-framework
    #   jupyter-client
    #   matplotlib
    #   moto
    #   pandas
python-json-logger==2.0.7
    # via jupyter-events
python-slugify==8.0.4
    # via cookiecutter
pytoolconfig==1.3.1
    # via rope
pytz==2024.1
    # via
    #   ibis-framework
    #   pandas
    #   snowflake-connector-python
pyviz-comms==3.0.3
    # via
    #   holoviews
    #   panel
pyyaml==6.0.2
    # via
    #   accelerate
    #   bandit
    #   bokeh
    #   cookiecutter
    #   dask
    #   distributed
    #   huggingface-hub
    #   jupyter-events
    #   kedro
    #   omegaconf
    #   pins
    #   pre-commit
    #   snowflake-snowpark-python
    #   transformers
pyzmq==26.1.1
    # via
    #   ipykernel
    #   jupyter-client
    #   jupyter-console
    #   jupyter-server
    #   qtconsole
qtconsole==5.5.2
    # via jupyter
qtpy==2.4.1
    # via qtconsole
redis==4.6.0
    # via kedro-datasets
referencing==0.35.1
    # via
    #   jsonschema
    #   jsonschema-specifications
    #   jupyter-events
regex==2024.7.24
    # via transformers
requests==2.32.3
    # via
    #   azure-core
    #   azure-datalake-store
    #   cookiecutter
    #   datasets
    #   fsspec
    #   gcsfs
    #   google-api-core
    #   google-cloud-bigquery
    #   google-cloud-storage
    #   hdfs
    #   huggingface-hub
    #   jupyterlab-server
    #   kedro-datasets
    #   kedro-telemetry
    #   moto
    #   msal
    #   panel
    #   pins
    #   requests-mock
    #   requests-oauthlib
    #   responses
    #   snowflake-connector-python
    #   tensorflow
    #   transformers
requests-mock==1.12.1
    # via kedro-datasets
requests-oauthlib==2.0.0
    # via google-auth-oauthlib
responses==0.18.0
    # via
    #   datasets
    #   moto
rfc3339-validator==0.1.4
    # via
    #   jsonschema
    #   jupyter-events
rfc3986-validator==0.1.1
    # via
    #   jsonschema
    #   jupyter-events
rich==13.7.1
    # via
    #   bandit
    #   cookiecutter
    #   ibis-framework
    #   kedro
    #   keras
rope==1.13.0
    # via kedro
rpds-py==0.20.0
    # via
    #   jsonschema
    #   referencing
rsa==4.9
    # via google-auth
ruamel-yaml==0.18.6
    # via pre-commit-hooks
ruamel-yaml-clib==0.2.8
    # via ruamel-yaml
ruff==0.0.292
    # via kedro-datasets
s3fs==2023.1.0
    # via kedro-datasets
s3transfer==0.6.2
    # via boto3
safetensors==0.4.4
    # via
    #   accelerate
    #   transformers
scikit-learn==1.5.1
    # via kedro-datasets
scipy==1.13.1
    # via
    #   kedro-datasets
    #   scikit-learn
send2trash==1.8.3
    # via jupyter-server
setuptools==73.0.1
    # via
    #   fs
    #   ipython
    #   jupyterlab
    #   pandas-gbq
    #   pydata-google-auth
    #   snowflake-snowpark-python
    #   tensorboard
    #   tensorflow
shapely==2.0.6
    # via geopandas
six==1.16.0
    # via
    #   astunparse
    #   azure-core
    #   behave
    #   bleach
    #   fiona
    #   fs
    #   google-pasta
    #   hdfs
    #   isodate
    #   parse-type
    #   python-dateutil
    #   rfc3339-validator
    #   tensorboard
    #   tensorflow
    #   triad
smmap==5.0.1
    # via gitdb
sniffio==1.3.1
    # via
    #   anyio
    #   httpx
snowflake-connector-python==3.12.1
    # via snowflake-snowpark-python
snowflake-snowpark-python==1.21.0
    # via kedro-datasets
sortedcontainers==2.4.0
    # via
    #   distributed
    #   snowflake-connector-python
soupsieve==2.6
    # via beautifulsoup4
sqlalchemy==2.0.32
    # via kedro-datasets
sqlglot==23.12.2
    # via ibis-framework
stevedore==5.2.0
    # via bandit
sympy==1.13.2
    # via torch
tables==3.9.2
    # via kedro-datasets
tblib==3.0.0
    # via distributed
tenacity==9.0.0
    # via plotly
tensorboard==2.17.1
    # via tensorflow
tensorboard-data-server==0.7.2
    # via tensorboard
tensorflow==2.17.0
    # via kedro-datasets
tensorflow-io-gcs-filesystem==0.37.1
    # via tensorflow
termcolor==2.4.0
    # via tensorflow
terminado==0.18.1
    # via
    #   jupyter-server
    #   jupyter-server-terminals
text-unidecode==1.3
    # via python-slugify
threadpoolctl==3.5.0
    # via scikit-learn
tinycss2==1.3.0
    # via nbconvert
tokenizers==0.15.2
    # via transformers
toml==0.10.2
    # via
    #   import-linter
    #   kedro
tomli==2.0.1
    # via
    #   black
    #   build
    #   coverage
    #   jupyterlab
    #   mypy
    #   pre-commit-hooks
    #   pytest
    #   pytoolconfig
tomlkit==0.13.2
    # via snowflake-connector-python
toolz==0.12.1
    # via
    #   dask
    #   distributed
    #   ibis-framework
    #   partd
torch==2.4.0
    # via
    #   accelerate
    #   transformers
tornado==6.4.1
    # via
    #   bokeh
    #   distributed
    #   ipykernel
    #   jupyter-client
    #   jupyter-server
    #   jupyterlab
    #   notebook
    #   terminado
tqdm==4.66.5
    # via
    #   datasets
    #   huggingface-hub
    #   panel
    #   transformers
traitlets==5.14.3
    # via
    #   comm
    #   ipykernel
    #   ipython
    #   ipywidgets
    #   jupyter-client
    #   jupyter-console
    #   jupyter-core
    #   jupyter-events
    #   jupyter-server
    #   jupyterlab
    #   matplotlib-inline
    #   nbclient
    #   nbconvert
    #   nbformat
    #   qtconsole
transformers==4.35.2
    # via kedro-datasets
triad==0.9.8
    # via kedro-datasets
triton==3.0.0
    # via torch
trufflehog==2.2.1
    # via kedro-datasets
trufflehogregexes==0.0.7
    # via trufflehog
types-cachetools==5.5.0.20240820
    # via kedro-datasets
types-cffi==1.16.0.20240331
    # via types-pyopenssl
types-decorator==5.1.8.20240310
    # via kedro-datasets
types-pyopenssl==24.1.0.20240722
    # via types-redis
types-python-dateutil==2.9.0.20240821
    # via arrow
types-pyyaml==6.0.12.20240808
    # via kedro-datasets
types-redis==4.6.0.20240819
    # via kedro-datasets
types-requests==2.31.0.6
    # via kedro-datasets
types-setuptools==72.2.0.20240821
    # via types-cffi
types-six==1.16.21.20240513
    # via kedro-datasets
types-tabulate==0.9.0.20240106
    # via kedro-datasets
types-urllib3==1.26.25.14
    # via types-requests
typing-extensions==4.12.2
    # via
    #   aioitertools
    #   anyio
    #   async-lru
    #   azure-core
    #   azure-identity
    #   azure-storage-blob
    #   black
    #   huggingface-hub
    #   ibis-framework
    #   kedro
    #   mypy
    #   optree
    #   panel
    #   snowflake-connector-python
    #   snowflake-snowpark-python
    #   sqlalchemy
    #   tensorflow
    #   torch
tzdata==2024.1
    # via pandas
uc-micro-py==1.0.3
    # via linkify-it-py
uri-template==1.3.0
    # via jsonschema
urllib3==1.26.19
    # via
    #   botocore
    #   distributed
    #   requests
    #   responses
    #   snowflake-connector-python
virtualenv==20.26.3
    # via pre-commit
wcwidth==0.2.13
    # via prompt-toolkit
webcolors==24.8.0
    # via jsonschema
webencodings==0.5.1
    # via
    #   bleach
    #   tinycss2
websocket-client==1.8.0
    # via jupyter-server
werkzeug==3.0.3
    # via
    #   moto
    #   tensorboard
wheel==0.44.0
    # via
    #   astunparse
    #   snowflake-snowpark-python
widgetsnbextension==4.0.11
    # via ipywidgets
wrapt==1.16.0
    # via
    #   aiobotocore
    #   tensorflow
xarray==2024.7.0
    # via kedro-datasets
xlsx2csv==0.8.3
    # via polars
xlsxwriter==1.4.5
    # via kedro-datasets
xmltodict==0.13.0
    # via moto
xxhash==3.5.0
    # via
    #   datasets
    #   pins
xyzservices==2024.6.0
    # via
    #   bokeh
    #   panel
yarl==1.9.4
    # via aiohttp
zict==3.0.0
    # via distributed
zipp==3.20.0
    # via
    #   importlib-metadata
    #   importlib-resources

In fact, the uv resolution can be fed to pip:

# pip install -r requirements.txt
...
Successfully built antlr4-python3-runtime docopt grimp hdfs import-linter pyspark kedro-datasets
Installing collected packages: ...

So a dependency solution exists - it's just that pip cannot resolve it.

Admittedly, the test dependencies for kedro-datasets are contrived, but there's no conflict - the installation failure stems from a limitation in the pip resolver.

And yet, our Makefile uses pip, so contributors will keep hitting this wall.

Are we ready to change our Makefile to use uv?

@astrojuanlu
Copy link
Member

Btw another contributor was blocked by this #807 (comment)

@rwpurvis
Copy link
Contributor

rwpurvis commented Aug 22, 2024

Data point...trying to work on a PR for #808 and ran in to this issue...I'll try uv

Using python 3.10, pip 24.2 (latest)

@astrojuanlu
Copy link
Member

astrojuanlu commented Aug 22, 2024

Well, every single kedro-datasets contributor is being blocked by this, so I'm liberally assigning High priority.

@astrojuanlu astrojuanlu changed the title make plugin=kedro-datasets install-test-requirements has dependency conflicts make plugin=kedro-datasets install-test-requirements fails Aug 22, 2024
@astrojuanlu
Copy link
Member

astrojuanlu commented Sep 9, 2024

As @ankatiyar pointed out on a meeting today, we're now using uv on our own CI, so the make install-test-requirements is basically untested ⚠️

- name: Install dependencies
run: |
cd ${{ inputs.plugin }}
uv pip install --system "kedro @ git+https://github.com/kedro-org/kedro@main"
uv pip install --system "${{inputs.plugin}}[test] @ ."

We can do several things:

  • We change our Makefile so that it uses uv, and put it back on our CI.
  • If we're not ready to endorse uv just yet, at least we should adjust our documentation guides, and potentially remove the command too.

Thoughts?

@noklam
Copy link
Contributor

noklam commented Oct 4, 2024

I have actually fixed uv with #464 already, the remaining bit is the system dependencies that are needed for OpenCV. Honestly I don't think user should be bother about this, unless they are working with the VideoDataest. We can update our docs to make this clearer.

@astrojuanlu
Copy link
Member

I see we merged this a few days after we last spotted the problem #597 (comment) let's close the issue

@noklam
Copy link
Contributor

noklam commented Oct 7, 2024

Sounds good, I have added a note in the installation guide (Github Wiki).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets
Projects
Status: Done
Development

No branches or pull requests

4 participants