Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow pdm lock example #2193

Open
1 task done
dimbleby opened this issue Aug 19, 2023 · 26 comments
Open
1 task done

slow pdm lock example #2193

dimbleby opened this issue Aug 19, 2023 · 26 comments
Labels
🐛 bug Something isn't working 🧩 dependency resolution Resolution failures

Comments

@dimbleby
Copy link

dimbleby commented Aug 19, 2023

  • I have searched the issue tracker and believe that this is not a duplicate.

Make sure you run commands with -v flag before pasting the output.

Steps to reproduce

$ mkdir stackstac
$ cd stackstac
$ git clone git@github.com:gjoseph92/stackstac.git .
$ git rev-parse HEAD
e30ecf233ee42c1c98fcfbc5178ff105ba60760e
$ pdm --version
PDM, version 2.8.2
$ time pdm --lock

Actual behavior

I actually don't know how long this takes - it has been about forty minutes so far. But it's a long time.

Expected behavior

Faster locking.

Environment Information

# Paste the output of `pdm info && pdm info --env` below:
PDM version:
  2.8.2
Python Interpreter:
  /home/dch/stackstac/.venv/bin/python (3.10)
Project Root:
  /home/dch/stackstac
Local Packages:

{
  "implementation_name": "cpython",
  "implementation_version": "3.10.12",
  "os_name": "posix",
  "platform_machine": "x86_64",
  "platform_release": "5.15.90.1-microsoft-standard-WSL2",
  "platform_system": "Linux",
  "platform_version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023",
  "python_full_version": "3.10.12",
  "platform_python_implementation": "CPython",
  "python_version": "3.10",
  "sys_platform": "linux"
}

I actually found this one by looking for examples of poetry being slow, in the hope of finding something interesting that could be improved there. stackstac switched from poetry to pdm "because Poetry was taking multiple hours to lock dependencies".

However today it seems to be pdm that is taking a considerable time to lock dependencies, so I thought you might appreciate the example.

@dimbleby dimbleby added the 🐛 bug Something isn't working label Aug 19, 2023
@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

There's a backtrack on pystac. Resolution would probably be faster without all these upper bounds, which are probably coming from the translation of Poetry deps to PEP 621 deps.

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

pdm.termui:   Adding requirement python-dateutil~=2.7.5(from sat-stac 0.4.1)
pdm.termui: Candidate rejected: sat-stac@0.4.1 because it introduces a new requirement python-dateutil~=2.7.5 that conflicts with other requirements:
    python-dateutil>=2.7 (from matplotlib@3.7.2)  
  python-dateutil>=2.8.2 (from pystac-client@0.7.2)  
  python-dateutil>=2.8.1 (from pandas@1.4.4)  
  python-dateutil>=2.7.0 (from pystac@1.7.2)

Latest sat-stac version, 0.4.1, still has python-dateutil~=2.7.5, while the GitHub repo has it as >=2.7.5. A new release with this change would help. See sat-utils/sat-stac#73

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

Try to add this to pyproject.toml in the meantime:

[tool.pdm.resolution.overrides]
python-dateutil = ">=2.7"

(see the docs: https://pdm.fming.dev/latest/usage/config/#override-the-resolved-package-versions)

IMO there's nothing more PDM can do here, so I'll close this issue!

@pawamoy pawamoy closed this as completed Aug 19, 2023
@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

With a populated cache:

% time pdm lock
🔒 Lock successful
Changes are written to pdm.lock.
pdm lock  59.23s user 1.94s system 57% cpu 1:46.25 total

Under a minute 🎉

@dimbleby
Copy link
Author

dimbleby commented Aug 19, 2023

I don't use either pdm or stackstac so I'm not particularly invested in this, feel free to close it out.

But I don't understand the point you're making. python-dateutil 2.7.5 is surely a perfectly good solution to all constraints, and better behaviour would be to find it (fast)?

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

It is not a good solution: both pystac-client and pandas require version >= 2.8.

  python-dateutil>=2.8.2 (from pystac-client@0.7.2)  
  python-dateutil>=2.8.1 (from pandas@1.4.4)  

~=2.7.5 means >=2.7.5,<2.8, so it's not compatible with the other requirements.

@dimbleby
Copy link
Author

So the solver should find an earlier version of pystac-client, etc.

(I have the poetry solution to hand, so if you want to know what answer it finds I can share that)

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

The solver eventually does find versions that are compatible, and that's what is taking a long time. Or it doesn't, and it still takes a long time. There's not much PDM (or Poetry, or any other dependency solver) can do when libraries are indeed incompatible.

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

I'll reopen for @frostming to take a look at this particular dependency tree and see if there's something wrong in the resolution that could be improved (in PDM or resolvelib). But that's a rather big tree, could be hard to investigate.

@pawamoy pawamoy reopened this Aug 19, 2023
@dimbleby
Copy link
Author

A compatible solution does exist, but pdm is failing to find it in reasonable time. (I still don't know how long it takes, the command is still running).

If you're OK with "eventually" then fair enough. But I'd think that fast is better than slow.

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

(I have the poetry solution to hand, so if you want to know what answer it finds I can share that)

Yes please! I'm curious now 🙂

@dimbleby
Copy link
Author

https://gist.github.com/dimbleby/f4599ffb2429400fc1c9963b632ccc82

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

Thanks! Well, the set of dependencies is indeed incompatible: the latest pandas which supports python-dateutil >=2.7.3 also supports Python 3.7, which is not supported in the main project here (stackstac, requiring python >= 3.8). The next version has a compatible Python requirement (>= 3.8), but an incompatible python-dateutil requirement (>=2.8.1). So there's no pandas version that is compatible with both Python 3.7 and python-dateutil ~=2.7.5. So PDM backtracks a lot, to find there's no solution.

To fix this, stackstac should either support Python 3.7, or they should use the override as shown above, or sat-stac should publish a new version accepting python-dateutil>=2.7.5.

Or maybe I got confused about the Python version compatibility? 🤔

@dimbleby
Copy link
Author

dimbleby commented Aug 19, 2023

Yes, I think you are confused. . pandas 1.3.5 supports "python >= 3.7.1", and that makes it compatible with a project that supports "python >= 3.8,<4.0".

While it's not impossible that the poetry solution is invalid - that project certainly has bugs! - it would be quite surprising.

@dimbleby
Copy link
Author

(Even if it is true that there is no solution, the bug report stands - then it would be desirable for pdm to be fast about proving that)

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

Indeed, you're right, it's the other way around, I was confused, trying to solve dependencies in my head 😅
Anyway, I think I can replicate the resolution issue with just these dependencies:

dependencies = [
    "pystac",
    "pystac-client",
    "sat-stac",
    "pandas",
]
% pdm lock
🔒 Lock failed
Unable to find a resolution for pystac
because of the following conflicts:
  pystac (from project)
  pystac>=1.7.0 (from pystac-client@0.6.1)
To fix this, you could loosen the dependency version constraints in pyproject.toml. See https://pdm.fming.dev/latest/usage/dependency/#solve-the-locking-failure for more details.
See /tmp/pdm-lock-4y0w_ryv.log for detailed debug log.
[ResolutionImpossible]: Unable to find a resolution
Add '-v' to see the detailed traceback

The final error message is not useful here, but it shows there's something wrong going on (either in the resolution itself or in the libraries compatibility).

@dimbleby
Copy link
Author

dimbleby commented Aug 19, 2023

sounds like progress. poetry does find a solution to that set of requirements, so certainly (at least) one of pdm / poetry is wrong.

The non-latest packages in poetry's solution are

pandas          1.3.5
pystac-client   0.6.1
python-dateutil 2.7.5

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

Thanks for insisting 🙂 And thanks for the initial report.

If I remove pystac from the dependencies, it solves after a few minutes, ending with this:

  • pandas 1.3.5
  • (pystac 1.8.3)
  • pystac-client 0.6.1
  • python-dateutil 2.7.5
  • (sat-stac 0.4.1)

So same as Poetry. Not sure why adding pystac to the deps gives the above error message.

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

Maybe just an impression but it looks like when I specify pystac, with or without >=1 (the former giving a bit more useful logs), pandas doesn't backtrack and PDM stops after it finds that no versions of pystac+pystac-client+pandas 2.0.3 are compatible.

Pinning sat-stac==0.4.1 on the other hand helps tremendously, and PDM solves in a few seconds, backtracking everything properly.

@pawamoy
Copy link
Contributor

pawamoy commented Aug 19, 2023

dependencies = [
    "pandas",
    "pystac",
    "pystac-client",
    "sat-stac>=0.4",
]
% pdm lock -v 2>&1 | grep -Eo 'from pandas[@ ][^)]+' | sort -u
from pandas 1.3.5
from pandas@1.3.5
from pandas 1.4.0
from pandas 1.4.1
from pandas 1.4.2
from pandas 1.4.3
from pandas 1.4.4
from pandas 1.5.0
from pandas 1.5.1
from pandas 1.5.2
from pandas 1.5.3
from pandas 2.0.0
from pandas 2.0.1
from pandas 2.0.2
from pandas 2.0.3

dependencies = [
    "pandas",
    "pystac",
    "pystac-client",
    "sat-stac",
]
% pdm lock -v 2>&1 | grep -Eo 'from pandas[@ ][^)]+' | sort -u
from pandas 2.0.3
from pandas@2.0.3

Looks like a backtracking limitation (to avoid combinatorial explosion?). Out of my depth here 😕

@frostming
Copy link
Collaborator

frostming commented Aug 20, 2023

There are cases where improper dependency set causes the resolver to track too deep and become significantly slower.

In this case, pandas was the first package to try but it is the one who carries a dependency on python-dateutil>=2.8 that is incompatible with others. Therefore, the resolver enters an error branch too early at the beginning and when it realizes there is no working resolution1, it has already gone too deep in the tree.

But if the resolver fails to resolve some dependencies that becomes successful after making the range more specific, this would be a bug.

Footnotes

  1. It only realizes this after exhausting all the versions of the other dependencies , because every package in this example has a dependency on python-dateutil. This is the specialness of this case.

@frostming frostming added the 🧩 dependency resolution Resolution failures label Aug 20, 2023
@dimbleby
Copy link
Author

But if the resolver fails to resolve some dependencies that becomes successful after making the range more specific, this would be a bug.

yes, that's where we got to. The thread got a bit long but the course it has taken is

  • here is a set of requirements for which pdm is unable to make a decision in reasonable time
  • here is a subset of those requirements that pdm says is impossible to solve, although a solution is available
  • adding more specific pins to the second set does allow pdm to find that solution

I guess it's plausible that if you fix the bug associated with the second set of requirements, then the failure-to-reach-any-decision of the original report may also go away.

@frostming
Copy link
Collaborator

BTW, you can always manually optimize the resolution to avoid it into an error branch at the beginning. In this case, you can set pandas<2

@dimbleby
Copy link
Author

Thanks, but I don't need workarounds. I'm reporting this just as a good citizen rather than because I care about a solution.

As I mentioned earlier, I actually was looking for examples where poetry behaved badly and it's just chance that I instead found one where pdm behaves badly. Letting you know about it seemed like the right thing to do.

(I guess it's possible that someone who does care about stackstac will show up and then they'll appreciate workarounds being available)

@Lawouach
Copy link

Hey all,

I'm a standard user of pdm. Personally, I find the slowness irritating at times but I always assume it's because of a complex resolution scheme at some point. What bothers me more is the lack of feedback, you get no idea of where the processing is at nor if there is anything you could do to improve it.

@frostming
Copy link
Collaborator

frostming commented Aug 21, 2023

Well, the resolution failure can be also observed on pip using the following requirements.txt:

pandas
pystac    # removing this will get a working resolution
pystac-client
sat-stac

It seems the pystac requirement somehow crashes the tree and prevent it from backtracking further. Because in the whole process till it reaches to a failure, pandas 1.* were never tried.

So it is a certain issue with resolvelib, and the fix should also happen there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🧩 dependency resolution Resolution failures
Projects
None yet
Development

No branches or pull requests

4 participants