[experimental] Run crosshair in CI #4034

Zac-HD · 2024-07-07T18:06:24Z

To reproduce this locally, you can run make check-crosshair-cover/nocover/niche for the same command as in CI, but I'd recommend pytest --hypothesis-profile=crosshair hypothesis-python/tests/{cover,nocover,datetime} -m xf_crosshair --runxfail to select and run only the xfailed tests.

Hypothesis' problems

Probably Crosshair's problems

Error in `operator.eq(Decimal('sNaN'), an_int)`

____ test_rewriting_does_not_compare_decimal_snan ____
  File "hypothesis/strategies/_internal/strategies.py", line 1017, in do_filtered_draw
    if self.condition(value):
TypeError: argument must be an integer
while generating 's' from integers(min_value=1, max_value=5).filter(functools.partial(eq, Decimal('sNaN')))

Cases where crosshair doesn't find a failing example but Hypothesis does

Seems fine, there are plenty of cases in the other direction. Tracked with @xfail_on_crosshair(Why.undiscovered) in case we want to dig in later.

Nested use of the Hypothesis engine (e.g. given-inside-given)

This is just explicitly unsupported for now. Hypothesis should probably offer some way for backends to declare that they don't support this, and then raise a helpful error message if you try anyway.

pschanely · 2024-07-07T21:03:45Z

@Zac-HD your triage above is SO great. I am investigating.

pschanely · 2024-07-08T17:47:56Z

Knocked out a few of these in 0.0.60.
I think that means current status on my end is:

TypeError: conversion from SymbolicInt to Decimal is not supported
Unsupported operand type(s) for -: 'float' and 'SymbolicFloat' in test_float_clamper
TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'ShellMutableMap' object (or 'values' or 'items').
TypeError: _int() got an unexpected keyword argument 'base'
Symbolic not realized (in e.g. test_suppressing_filtering_health_check)
Error in operator.eq(Decimal('sNaN'), an_int)
Zac's cursed example below!

More soon.

Zac-HD · 2024-07-12T07:17:48Z

Ah - the Flaky failures are of course because we had some failure under the Crosshair backend, which did not reproduce under the Hypothesis backend. This is presumably going to point to a range of integration bugs, but is also something that we'll want to clearly explain to users because integration bugs are definitely going to happen in future and users will need to respond (by e.g. using a different backend, ignoring the problem, whatever).

improve the reporting around Flaky failures where the differing or missing errors are related to a change of backend while shrinking. See also Change Flaky to be an ExceptionGroup #4040.
triage all the current failures so we can fix them

tybug · 2024-07-12T15:33:33Z

Most/all of the "expected x, got symbolic" errors are symptoms of an underlying error in my experience (often operation on symbolic while not tracing). In this case running with export HYPOTHESIS_NO_TRACEBACK_TRIM=1 reveals limited_category_index_cache in cm.query is at fault.

Zac-HD · 2024-07-12T18:43:06Z

ah-ha, seems like we might want some #4029 - style 'don't cache on backends with avoid_realize=True' logic.

pschanely · 2024-07-13T00:39:13Z

Still here and excited about this! I am on a detour of doing a real symbolic implementation of the decimal module - should get that out this weekend.

Zac-HD · 2024-07-13T08:18:25Z

Triaging a pile of the Flaky erorrs, most were due to getting a RecursionError under crosshair and then passing under Hypothesis - and it looks like most of those were in turn because of all our nested-@given() test helpers.

So I've tried de-nesting those, which seems to work nicely and even makes things a bit faster by default; and when CI finishes we'll see how much it helps on crosshair 🤞

Zac-HD · 2024-09-17T07:58:01Z

most of the currently failing tests look like they might be crosshair issues, cc @pschanely:

and I'll skip the database test - crosshair just finishes exploring sooner than that test expects 😁

Zac-HD · 2024-10-10T19:28:09Z

@pschanely huge progress from recent updates! The BackendCannotProceed mechanism entirely fixed several classes of issues, the floats changes have been great (signed zero ftw!), from_type() generates instances more often, I'm no longer skipping categories of stuff, and overall we've dropped from about +350 to +250 lines of code in this PR 🎊

At this point my only real reason to avoid merging is that crosshair updates often cause a fair bit of churn, causing some tests to start failing and some to start xpassing - it's net-good, but would be toil in our CI. I feel like we've crossed from an alpha-version which is a neat proof of concept, to a beta-version which is still early but already both useful and clearly on a path to stability and wider adoption. Incredibly excited about this ✨

If you want to pull out Crosshair issues,

this PR is probably useful as a pre-release test, to check whether there are any regressions you didn't expect
there's a commit marking some things that look like Crosshair bugs to me, and many more where Crosshair just doesn't find a failure that Hypothesis does (within the test budget, and which might or might not be a problem)
there's a commit full of tests skipped because they were very slow, if you want to look at performance issues. I haven't audited it lately but would guess at least a third are still slow + also Crosshair's problem.
the last big commit is pretty messy, probably best to ignore that for now

pschanely · 2024-10-11T12:58:30Z

@pschanely huge progress from recent updates! The BackendCannotProceed mechanism entirely fixed several classes of issues, the floats changes have been great (signed zero ftw!), from_type() generates instances more often, I'm no longer skipping categories of stuff, and overall we've dropped from about +350 to +250 lines of code in this PR 🎊

So great.

At this point my only real reason to avoid merging is that crosshair updates often cause a fair bit of churn, causing some tests to start failing and some to start xpassing - it's net-good, but would be toil in our CI.

Frankly, I'm not sure it makes sense to block hypothesis on a crosshair-related failure, even in a very distant, stable future. Would love your ideas making the integration more "eventually" correct. Maybe a dedicated testing repo that pulls the hypothesis source and has these pytest markers externally applied? (or submodules? but those scare me)

If you want to pull out Crosshair issues,

Always. Thanks for the commit breakdown. More updates soon!

Zac-HD · 2024-10-12T00:57:16Z

Frankly, I'm not sure it makes sense to block hypothesis on a crosshair-related failure, even in a very distant, stable future. Would love your ideas making the integration more "eventually" correct. Maybe a dedicated testing repo that pulls the hypothesis source and has these pytest markers externally applied? (or submodules? but those scare me)

For clarity, "blocking" would mean 'when we update our pinned dependencies, if Crosshair has changed we'll update the xfail markers accordingly and report any issues upstream, or maybe add a != requirement for that version'. Similarly, if a Hypothesis PR doesn't work with Crosshair I'd prefer to learn that at the time so I can decide to either xfail the tests, or do some extra work to support it - and my guess is that the converse would be useful for you too.

In practice I expect I'll just keep updating this PR for now, and you can grab a local copy of the branch if you want to run the tests before a Crosshair release 😁 (and note the test-selection tips at the top of the pr!)

pschanely · 2024-10-12T19:55:08Z

For clarity, "blocking" would mean 'when we update our pinned dependencies, if Crosshair has changed we'll update the xfail markers accordingly and report any issues upstream, or maybe add a != requirement for that version'. Similarly, if a Hypothesis PR doesn't work with Crosshair I'd prefer to learn that at the time so I can decide to either xfail the tests, or do some extra work to support it - and my guess is that the converse would be useful for you too.

Fair enough! I was concerned about how much churn in CrossHair pass/fails you'll see for unrelated hypothesis changes, but it's also true that I want to know about what you see. Current plan SGTM.

In practice I expect I'll just keep updating this PR for now, and you can grab a local copy of the branch if you want to run the tests before a Crosshair release 😁 (and note the test-selection tips at the top of the pr!)

Yup! I've been doing this a little already; works for me.

Zac-HD added tests/build/CI about testing or deployment *of* Hypothesis interop how to play nicely with other packages labels Jul 7, 2024

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch 3 times, most recently from 175b347 to 424943f Compare July 7, 2024 20:26

Zac-HD mentioned this pull request Jul 7, 2024

Stable support for symbolic execution #3914

Open

19 tasks

Zac-HD force-pushed the crosshair-in-ci branch from 424943f to b2d11c7 Compare July 7, 2024 20:56

Zac-HD force-pushed the crosshair-in-ci branch from b2d11c7 to 98ccf44 Compare July 11, 2024 07:23

This comment was marked as outdated.

Sign in to view

Zac-HD force-pushed the crosshair-in-ci branch from 98ccf44 to 4bd7e45 Compare July 12, 2024 07:48

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 1d2345d to 7bf8983 Compare July 12, 2024 20:15

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from cc07927 to 018ccab Compare July 13, 2024 07:23

This comment was marked as outdated.

Sign in to view

pschanely mentioned this pull request Jul 13, 2024

Auto-realize encode/decode arguments for unsupported codecs pschanely/CrossHair#271

Closed

This was referenced Jul 13, 2024

Make Flaky into an ExceptionGroup #4043

Merged

Test-only improvements #4047

Merged

Zac-HD force-pushed the crosshair-in-ci branch from 5d77e32 to 00a9931 Compare July 13, 2024 21:35

This was referenced Jul 13, 2024

Duplicate type "<class 'array.array'>" registered from repeated imports? pschanely/hypothesis-crosshair#17

Closed

hashlib requires the buffer protocol, which symbolics bytes don't provide pschanely/CrossHair#272

Closed

Zac-HD mentioned this pull request Aug 9, 2024

Experience report + wishlist from Shrinkray + Hypothesis + Crosshair DRMacIver/shrinkray#8

Closed

Zac-HD force-pushed the crosshair-in-ci branch 4 times, most recently from 40df9ba to 380aac8 Compare August 15, 2024 16:43

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 16272d1 to 9df503e Compare August 20, 2024 10:10

Zac-HD mentioned this pull request Aug 24, 2024

Various test cleanups for Crosshair integration #4090

Merged

Zac-HD force-pushed the crosshair-in-ci branch from 9df503e to eaa50ab Compare August 24, 2024 23:24

Zac-HD mentioned this pull request Aug 25, 2024

Internal error: TypeError: type.__new__() argument 3 must be dict, not ShellMutableMap pschanely/CrossHair#304

Closed

Zac-HD force-pushed the crosshair-in-ci branch from 36d5442 to 2dd5577 Compare August 25, 2024 03:47

Zac-HD force-pushed the crosshair-in-ci branch 2 times, most recently from 78ca207 to 0fe069a Compare September 11, 2024 08:43

Zac-HD mentioned this pull request Sep 13, 2024

Update dependencies + test cleanups #4104

Merged

Zac-HD force-pushed the crosshair-in-ci branch from 0fe069a to cc74c4c Compare September 16, 2024 09:45

pschanely mentioned this pull request Sep 19, 2024

Investigate some more hypothesis tests pschanely/CrossHair#307

Closed

3 tasks

Zac-HD force-pushed the crosshair-in-ci branch 4 times, most recently from 3701cdd to f0c9a70 Compare October 10, 2024 09:17

Zac-HD added 4 commits October 10, 2024 10:25

run crosshair in CI

93ca602

skip hanging or very slow tests

3a70fd1

Mark expected failures under crosshair

cbacd23

Mark failures for crosshair to fix?

c7816d0

Zac-HD force-pushed the crosshair-in-ci branch from f0c9a70 to c7816d0 Compare October 10, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] Run crosshair in CI #4034

[experimental] Run crosshair in CI #4034

Zac-HD commented Jul 7, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

pschanely commented Jul 7, 2024

pschanely commented Jul 8, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

This comment was marked as outdated.

tybug commented Jul 12, 2024

Zac-HD commented Jul 12, 2024 •

edited

Loading

pschanely commented Jul 13, 2024

Zac-HD commented Jul 13, 2024

This comment was marked as outdated.

Zac-HD commented Sep 17, 2024

Zac-HD commented Oct 10, 2024 •

edited

Loading

pschanely commented Oct 11, 2024

Zac-HD commented Oct 12, 2024

pschanely commented Oct 12, 2024

[experimental] Run crosshair in CI #4034

Are you sure you want to change the base?

[experimental] Run crosshair in CI #4034

Conversation

Zac-HD commented Jul 7, 2024 • edited Loading

Hypothesis' problems

Probably Crosshair's problems

Error in operator.eq(Decimal('sNaN'), an_int)

Cases where crosshair doesn't find a failing example but Hypothesis does

Nested use of the Hypothesis engine (e.g. given-inside-given)

This comment was marked as outdated.

This comment was marked as outdated.

pschanely commented Jul 7, 2024

pschanely commented Jul 8, 2024 • edited Loading

Zac-HD commented Jul 12, 2024 • edited Loading

This comment was marked as outdated.

tybug commented Jul 12, 2024

Zac-HD commented Jul 12, 2024 • edited Loading

pschanely commented Jul 13, 2024

Zac-HD commented Jul 13, 2024

This comment was marked as outdated.

Zac-HD commented Sep 17, 2024

Zac-HD commented Oct 10, 2024 • edited Loading

pschanely commented Oct 11, 2024

Zac-HD commented Oct 12, 2024

pschanely commented Oct 12, 2024

Zac-HD commented Jul 7, 2024 •

edited

Loading

Error in `operator.eq(Decimal('sNaN'), an_int)`

pschanely commented Jul 8, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

Zac-HD commented Jul 12, 2024 •

edited

Loading

Zac-HD commented Oct 10, 2024 •

edited

Loading