Regression in 3.12 beta in json.dump deeply nested dict #107263

hauntsaninja · 2023-07-25T19:33:52Z

This was reported here: https://discuss.python.org/t/has-sys-setrecursionlimit-behaviour-changed-in-python-3-12b/30205

The following program works fine on 3.11, but crashes with RecursionError on 3.12:

d = {}
for x in range(1_000):
    d = {'k': d}

import json, sys
sys.setrecursionlimit(100_000_000)
foo = json.dumps(d)

I confirmed this bisects to #96510

Linked PRs

GH-107263: Increase C stack limit for most functions, except _PyEval_EvalFrameDefault() #107535
[3.12] GH-107263: Increase C stack limit for most functions, except _PyEval_EvalFrameDefault() (GH-107535) #107618

The text was updated successfully, but these errors were encountered:

hauntsaninja · 2023-07-25T19:34:59Z

cc @markshannon

sunmy2019 · 2023-07-26T01:59:02Z

Related #105003

hauntsaninja · 2023-07-26T02:06:51Z

Also cc @Yhg1s out of caution, since we're close to RC.

At a minimum, I'm guessing we need a What's New

hauntsaninja · 2023-07-26T02:11:39Z

The relevant number is the C stack, so depending on your numbers, note you don't need to set sys.recursionlimit to see a regression. The following works in 3.11, but fails in 3.12:

d = {}
for x in range(810):
    d = {'k': d}

import json
foo = json.dumps(d)

sunmy2019 · 2023-07-26T02:16:42Z

At a minimum, I'm guessing we need a What's New

I am in favor of adding a new API exposing this. Nothing else would fundamentally fix this.

markshannon · 2023-07-26T11:05:14Z

If we change the original example to:

d = {}
for x in range(100_000):
    d = {'k': d}

import json, sys
sys.setrecursionlimit(100_000_000)
foo = json.dumps(d)

It segfaults on 3.11.
I would argue that not segfaulting is not a regression, but an improvement.

Here are what we can do to lessen the impact of this change:

Leaves things as they are.
Tweak the C recursion limit, so that PyEval_EvalDefault() consumes more units than other functions so other C code can recurse at least 1000 deep. I'd up the limit to ~2200 making PyEval_EvalDefault() consume 3 units, and everything else 2.
Allow the C recursion limit to be changed, which I don't think is a good idea for safety reasons.

Whatever we do, I'll add a section to the "what's new".

markshannon · 2023-07-26T11:14:25Z

Something else we could do is to change the C recursion limit from 800 to 1500 on optimized builds, where the C frames will be smaller.

A few tests will need changing to handle the differing recursion limits, but that's not a bad thing as it should make the tests more general.

sunmy2019 · 2023-07-26T12:39:26Z

Allow the C recursion limit to be changed, which I don't think is a good idea for safety reasons.

Can you elaborate on that?

It segfaults on 3.11.

People can easily increase their stack size on Unix-like systems with ulimit

As OS provides this API, why would Python forbid users with real need?

Other virtual machines like V8 (JavaScript runtime) can increase stack size on Unix-like systems (while not on Windows).

Increasing stack size/recursion limit is a real need for some users, and this limit will be a problem without easy workaround. It breaks user's code without possibility of easy migration.

lelit · 2023-07-26T13:21:03Z

Here are what we can do to lessen the impact of this change:
* Leaves things as they are.

I'd find that surprising, at the very least that would require changing sys.setrecursionlimit() doc saying that it's now a no-op,
consider:

Python 3.12.0b3 (main, Jun 19 2023, 18:56:29) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def data(n):
...   d = {}
...   for x in range(n):
...     d = {'k': d}
...   return d
... 
>>> import json
>>> foo = json.dumps(data(798))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/arb...-python3-3.12.0b3/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/arb...-python3-3.12.0b3/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/arb...-python3-3.12.0b3/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while encoding a JSON object
>>> foo = json.dumps(data(797))
>>> import sys
>>> sys.getrecursionlimit()
1000
>>> sys.setrecursionlimit(500)
>>> foo = json.dumps(data(797))
>>>

gpshead · 2023-07-26T20:36:44Z

Alternative take: The json module shouldn't be using recursion. (Which is out of scope as an rc1 release blocker resolution)

gpshead · 2023-07-26T21:57:39Z

People can easily increase their stack size on Unix-like systems

Don't presume this. System, policy, and container limits can constrain the maximum ulimit. And people dealing with untrusted data as is common both encoding or decoding json generally do not have test cases to determine what limits they require to not crash. People's expectations are reasonable that Python will never crash (segfault) the process due to a stack overflow.

Another lack of control: Threads are frequently spawned by extension modules or embedding languages (C/C++/Rust) where the C stack size is configured by that code at thread creation time and can a lot smaller than the ulimit default main thread stack size for a variety of valid reasons. This can be pretty far removed from the control of the Python application user.

sunmy2019 · 2023-07-27T03:52:07Z

And people dealing with untrusted data as is common both encoding or decoding json generally do not have test cases to determine what limits they require to not crash. People's expectations are reasonable that Python will never crash (segfault) the process due to a stack overflow.

I am not asking to remove this limit. I think it should be made configurable at least for professional users.
Since only people with professional knowledge would change this, which means they will know what they are doing and expect a segfault if they set it not correctly.

Recursive deep is not an error if you have a large stack size. Also if you have a very small stack size, the current limit would cause seg fault either.

Currently, the only workaround for this would be editing the source code and compiling again. That's too much even for professional users.

gpshead · 2023-07-27T04:02:04Z

I am not asking to remove this limit. I think it should be made configurable at least for professional users.

Yep, understood. I do think that's a stack / recursion limit related feature request on its own that we should track and decide in its own issue. (feel free to file one)

sunmy2019 · 2023-07-27T04:26:51Z

a stack / recursion limit related feature request on its own

I tend to think it's a bug fix.

If that's a feature request and got passed, then we can run such programs in 3.10, 3.11, and 3.13, which is very weird not having them on 3.12.

…_EvalFrameDefault()` (GH-107535) * Set C recursion limit to 1500, set cost of eval loop to 2 frames, and compiler mutliply to 2.

…PyEval_EvalFrameDefault()` (pythonGH-107535) * Set C recursion limit to 1500, set cost of eval loop to 2 frames, and compiler mutliply to 2. (cherry picked from commit fa45958) Co-authored-by: Mark Shannon <mark@hotpy.org>

…_PyEval_EvalFrameDefault()` (GH-107535) (#107618) GH-107263: Increase C stack limit for most functions, except `_PyEval_EvalFrameDefault()` (GH-107535) * Set C recursion limit to 1500, set cost of eval loop to 2 frames, and compiler mutliply to 2. (cherry picked from commit fa45958) Co-authored-by: Mark Shannon <mark@hotpy.org>

Eclips4 · 2023-08-11T17:26:12Z

Is there anything left to do?

Yhg1s · 2023-09-05T10:52:02Z

Assuming this is all fixed now.

dimpase · 2023-11-18T08:42:38Z

This has broken basic abilities of functools.cache to speed up recursive computations, see #112215
Take the classic Fibonacci sequence

#  fib.py 
import sys
sys.setrecursionlimit(2000)

from functools import cache

@cache
def fib(n):
    if n<1: return 0
    if n==1: return 1
    return fib(n-1) + fib(n-2)

print(fib(500))

Now

$ time python3.11 <fib.py 
139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125

real	0m0.125s
user	0m0.092s
sys	0m0.034s

is quite quick (removing @cache produces a HUGE slowdown). But, as is easy to find out (I did it on #112215),
that #96510 introduced the regression:

$ time python3.12 <fib.py 
Traceback (most recent call last):
  File "<stdin>", line 12, in <module>
  File "<stdin>", line 10, in fib
  File "<stdin>", line 10, in fib
  File "<stdin>", line 10, in fib
  [Previous line repeated 496 more times]
RecursionError: maximum recursion depth exceeded

real	0m0.133s
user	0m0.099s
sys	0m0.034s

By the way, increasing ulimit -s (from 8Mb to 64Mb, that's more than enough here) has no effect.

dimpase · 2023-11-18T08:47:10Z

Assuming this is all fixed now.

you still need to fix functools.cache and friends to remain useful for anything mildly serious, see my comment above.

hauntsaninja added the type-bug An unexpected behavior, bug, or error label Jul 25, 2023

AlexWaygood added 3.12 bugs and security fixes 3.13 bugs and security fixes labels Jul 25, 2023

Yhg1s added the release-blocker label Jul 26, 2023

markshannon self-assigned this Jul 26, 2023

bedevere-bot mentioned this issue Aug 1, 2023

GH-107263: Increase C stack limit for most functions, except _PyEval_EvalFrameDefault() #107535

Merged

markshannon added a commit that referenced this issue Aug 4, 2023

GH-107263: Increase C stack limit for most functions, except `_PyEval…

fa45958

…_EvalFrameDefault()` (GH-107535) * Set C recursion limit to 1500, set cost of eval loop to 2 frames, and compiler mutliply to 2.

bedevere-bot mentioned this issue Aug 4, 2023

[3.12] GH-107263: Increase C stack limit for most functions, except _PyEval_EvalFrameDefault() (GH-107535) #107618

Merged

Yhg1s closed this as completed Sep 5, 2023

hugovk mentioned this issue Sep 25, 2023

Improve import time of various stdlib modules #109653

Closed

sunmy2019 mentioned this issue Nov 18, 2023

3.12 setrecursionlimit is ignored in connection with @functools.cache #112215

Open

This was referenced Nov 20, 2023

sys.setrecursionlimit docs are incorrect in 3.12 and 3.13 #112282

Open

Support python 3.12 on sagemath-standard sagemath/sage#36407

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in 3.12 beta in json.dump deeply nested dict #107263

Regression in 3.12 beta in json.dump deeply nested dict #107263

hauntsaninja commented Jul 25, 2023 •

edited by bedevere-bot

Loading

hauntsaninja commented Jul 25, 2023

sunmy2019 commented Jul 26, 2023

hauntsaninja commented Jul 26, 2023 •

edited

Loading

hauntsaninja commented Jul 26, 2023 •

edited

Loading

sunmy2019 commented Jul 26, 2023

markshannon commented Jul 26, 2023

markshannon commented Jul 26, 2023

sunmy2019 commented Jul 26, 2023

lelit commented Jul 26, 2023

gpshead commented Jul 26, 2023 •

edited

Loading

gpshead commented Jul 26, 2023

sunmy2019 commented Jul 27, 2023

gpshead commented Jul 27, 2023

sunmy2019 commented Jul 27, 2023

Eclips4 commented Aug 11, 2023

Yhg1s commented Sep 5, 2023

dimpase commented Nov 18, 2023

dimpase commented Nov 18, 2023

Regression in 3.12 beta in json.dump deeply nested dict #107263

Regression in 3.12 beta in json.dump deeply nested dict #107263

Comments

hauntsaninja commented Jul 25, 2023 • edited by bedevere-bot Loading

Linked PRs

hauntsaninja commented Jul 25, 2023

sunmy2019 commented Jul 26, 2023

hauntsaninja commented Jul 26, 2023 • edited Loading

hauntsaninja commented Jul 26, 2023 • edited Loading

sunmy2019 commented Jul 26, 2023

markshannon commented Jul 26, 2023

markshannon commented Jul 26, 2023

sunmy2019 commented Jul 26, 2023

lelit commented Jul 26, 2023

gpshead commented Jul 26, 2023 • edited Loading

gpshead commented Jul 26, 2023

sunmy2019 commented Jul 27, 2023

gpshead commented Jul 27, 2023

sunmy2019 commented Jul 27, 2023

Eclips4 commented Aug 11, 2023

Yhg1s commented Sep 5, 2023

dimpase commented Nov 18, 2023

dimpase commented Nov 18, 2023

hauntsaninja commented Jul 25, 2023 •

edited by bedevere-bot

Loading

hauntsaninja commented Jul 26, 2023 •

edited

Loading

hauntsaninja commented Jul 26, 2023 •

edited

Loading

gpshead commented Jul 26, 2023 •

edited

Loading