Description
Feature or enhancement
Proposal:
I was looking at several Python JSON encoders that are faster than Python's json
module. One thing they do not do is checking for circular references. Instead they rely on RecursionError
^H^H^H^H^H^H^H custom recursion limit to do it for them:
ultrajson gives you OverflowError: Maximum recursion level reached
orjson gives you a TypeError: Recursion limit reached.
I tried comparing json.dumps()
with json.dumps(check_circular=False)
and it gave 15% on a JSON used in JSON library comparison. The result might be different for different use cases.
But I think having extra code to track references just to raise a nicer ValueError
is a bit too much. I have a patch that removes the extra code and tries to keep everything backward compatible. Could remove even more by removing markers
from c_make_encoder
and _make_iterencode
but that requires fixes to some tests and maybe it breaks something.
Anyway, open to feedback.
# Add a code block here, if required
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response
Linked PRs
Metadata
Metadata
Assignees
Projects
Status
Activity
nineteendo commentedon May 30, 2025
@picnixz, another JSON issue.
aivarsk commentedon May 30, 2025
Something broke on WASI build, I will try to figure out
pyperformance:
nineteendo commentedon May 30, 2025
Is this the "worse" referred to in the docs?
Also, if you're running
python.exe -m pyperformance run -b "json_dumps" -r -o out.json
, that's withcheck_circular=True
.aivarsk commentedon May 30, 2025
pyperformance is for before and after my patch and it shows there is a performance improvement for check_circular=True (the default value) when the custom code is removed and RecursionError is used to signal about circular references.
Sorry, I did not understand the 'the "worse" referred to in the docs'
gpshead commentedon May 30, 2025
In general I like that this simplifies the code to maintain and as faster... BUT the one failure showing up in CI is an example of the potential danger of relying on RecursionError being raised where we did not before adds, API wise: A dying process instead of the previous nice error when CPython isn't able to detect recursion depth limits vs the underlying platform it is running on accurately. (unsurprisingly, the WASI test - the WASM VM has much tighter limits)
That doesn't necessarily block this change (the test could be adjusted to not go so deep). WASI has recursion depth limits problems in general. But it shows that it isn't entirely safe as other environments interpreters run in could hit the same situation.
gpshead commentedon May 30, 2025
i'm blindly assuming the WASI failure was a C stack overflow FWIW, I didn't dig into anything.
nineteendo commentedon May 30, 2025
Could you compare the performance of this PR with one that simply changes the default value? This is not really a fair comparison.
serhiy-storchaka commentedon Jun 2, 2025
RecursionError is raised in different case -- for too deep nested structures -- and it is just an implementation detail. It is possible to write an implementation that can serialize arbitrary nested structures.
Also, RecursionError is raised for two different reasons. In the C code it prevents the C stack overflow. This is unreliable way, but better than nothing. In the Python code it is not needed, it is raised implicitly because there is global limit for recursion calls.
Also, the circular reference can be detected as soon as it is found, while RecursionError is raised much later, wasting time and memory (and possibly writing or transmitting unnecessary data). The Python interpreter is also in vulnerable state near the end of the recursion limit -- some calls can fail with their own RecursionError.
In any case, I do not recommend to rely on RecursionError. This is an extreme case. A normal program shouldn't get to this point.
aivarsk commentedon Jun 2, 2025
I am having doubts about my patch. I looked at other "faster" JSON library implementations and they rely on a custom counter for recursion depth and stop at 1024 by default our it is changeable through API.
2 remaining items