Skip to content

Random segfaults on Python 3.12.10 during CI testing #134193

Open
@MilesCranmer

Description

@MilesCranmer

Crash report

What happened?

We see segfaults randomly (not frequently enough to get more detailed info) when running integration tests for PySR. The following segfault was seen on Python 3.12.10.

Full stack trace of the segfault (click to expand)
[2413] signal 11 (1): Segmentation fault
in expression starting at none:0
_PyObject_IS_GC at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_object.h:343 [inlined]
visit_decref at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:465
dict_traverse at /home/runner/work/_temp/SourceCode/Objects/dictobject.c:3549
subtract_refs at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:491 [inlined]
deduce_unreachable at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1116
gc_collect_main at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1[242](https://github.com/MilesCranmer/PySR/actions/runs/15098605107/job/42436577017#step:9:243)
gc_collect_with_callback at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1426
gc_collect_generations at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1481 [inlined]
_Py_RunGC at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:2318
_Py_HandlePending at /home/runner/work/_temp/SourceCode/Python/ceval_gil.c:1045
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/ceval.c:836
_PyEval_EvalFrame at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_ceval.h:89 [inlined]
gen_send_ex2 at /home/runner/work/_temp/SourceCode/Objects/genobject.c:230 [inlined]
gen_iternext at /home/runner/work/_temp/SourceCode/Objects/genobject.c:603
builtin_all at /home/runner/work/_temp/SourceCode/Python/bltinmodule.c:322
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:2913
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
PyObject_Vectorcall at /home/runner/work/_temp/SourceCode/Objects/call.c:325
property_descr_set at /home/runner/work/_temp/SourceCode/Objects/descrobject.c:1678
_PyObject_GenericSetAttrWithDict at /home/runner/work/_temp/SourceCode/Objects/object.c:1569 [inlined]
PyObject_GenericSetAttr at /home/runner/work/_temp/SourceCode/Objects/object.c:1636
wrap_setattr at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:8050
wrapperdescr_raw_call at /home/runner/work/_temp/SourceCode/Objects/descrobject.c:537 [inlined]
wrapperdescr_call at /home/runner/work/_temp/SourceCode/Objects/descrobject.c:574
_PyObject_MakeTpCall at /home/runner/work/_temp/SourceCode/Objects/call.c:240
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:2715
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
vectorcall_unbound at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:2236 [inlined]
vectorcall_method at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:2267 [inlined]
slot_tp_setattro at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:8905
PyObject_SetAttr at /home/runner/work/_temp/SourceCode/Objects/object.c:1192
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:1135
_PyFunction_Vectorcall at /home/runner/work/_temp/SourceCode/Objects/call.c:419 [inlined]
_PyObject_FastCallDictTstate at /home/runner/work/_temp/SourceCode/Objects/call.c:144 [inlined]
_PyObject_Call_Prepend at /home/runner/work/_temp/SourceCode/Objects/call.c:508
slot_tp_init at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:9035
type_call at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:1679
_PyObject_MakeTpCall at /home/runner/work/_temp/SourceCode/Objects/call.c:240
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:2715
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
method_vectorcall at /home/runner/work/_temp/SourceCode/Objects/classobject.c:61
unknown function (ip: 0x7f18b800f0e4)
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:3[263](https://github.com/MilesCranmer/PySR/actions/runs/15098605107/job/42436577017#step:9:264)
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
PyObject_CallOneArg at /home/runner/work/_temp/SourceCode/Objects/call.c:401
slot_tp_repr at /home/runner/work/_temp/SourceCode/Objects/typeobject.c:8720
PyObject_Str at /home/runner/work/_temp/SourceCode/Objects/object.c:630
PyFile_WriteObject at /home/runner/work/_temp/SourceCode/Objects/fileobject.c:124
builtin_print_impl at /home/runner/work/_temp/SourceCode/Python/bltinmodule.c:2057 [inlined]
builtin_print at /home/runner/work/_temp/SourceCode/Python/clinic/bltinmodule.c.h:962
cfunction_vectorcall_FASTCALL_KEYWORDS at /home/runner/work/_temp/SourceCode/Objects/methodobject.c:438
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
PyObject_Vectorcall at /home/runner/work/_temp/SourceCode/Objects/call.c:325
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:[271](https://github.com/MilesCranmer/PySR/actions/runs/15098605107/job/42436577017#step:9:272)5
_PyObject_VectorcallTstate at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_call.h:92 [inlined]
method_vectorcall at /home/runner/work/_temp/SourceCode/Objects/classobject.c:91
_PyEval_EvalFrameDefault at /home/runner/work/_temp/SourceCode/Python/bytecodes.c:3263
pyrun_file at /home/runner/work/_temp/SourceCode/Python/pythonrun.c:1674
_PyRun_SimpleFileObject at /home/runner/work/_temp/SourceCode/Python/pythonrun.c:459
_PyRun_AnyFileObject at /home/runner/work/_temp/SourceCode/Python/pythonrun.c:78
pymain_run_file_obj at /home/runner/work/_temp/SourceCode/Modules/main.c:361 [inlined]
pymain_run_file at /home/runner/work/_temp/SourceCode/Modules/main.c:380 [inlined]
pymain_run_python at /home/runner/work/_temp/SourceCode/Modules/main.c:634 [inlined]
Py_RunMain at /home/runner/work/_temp/SourceCode/Modules/main.c:714
Py_BytesMain at /home/runner/work/_temp/SourceCode/Modules/main.c:768
unknown function (ip: 0x7f18b7a2a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /opt/hostedtoolcache/Python/3.12.10/x64/bin/python (unknown line)
Allocations: 379668601 (Pool: 379663673; Big: 4928); GC: 303
/home/runner/work/_temp/b8a8bd4b-faae-4481-94e0-330e30094622.sh: line 1:  2413 Segmentation fault      (core dumped) coverage run -m pysr test main,cli,startup

PySR relies on PythonCall.jl to interface with Julia backend libraries.

This stacktrace was seen from the following GitHub action: https://github.com/MilesCranmer/PySR/blob/057e3ec5a9d87c9495f7d36965ce4cbee5ef452a/.github/workflows/CI.yml#L21-L88. I have uploaded the full logs for this action to the following gist: https://gist.github.com/MilesCranmer/1df76efecf606123dae2146e461676b1 which have further details on the system and environment.

Unfortunately I don't think this will be easily reproducible as we see it randomly, and not very frequently. We thought it might be worth reporting anyway, to see if there were any ideas for where the issue is coming from.

The CI runs on 3.8, 3.10, and 3.12. I have not seen this segfault on 3.8 nor on 3.10.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

N/A (GitHub action)

Activity

added
type-crashA hard crash of the interpreter, possibly with a core dump
on May 18, 2025
JelleZijlstra

JelleZijlstra commented on May 18, 2025

@JelleZijlstra
Member

The segfault is here:

2025-05-18T18:16:37.0208953Z [2413] signal 11 (1): Segmentation fault
2025-05-18T18:16:37.0209418Z in expression starting at none:0
2025-05-18T18:16:37.0265965Z _PyObject_IS_GC at /home/runner/work/_temp/SourceCode/./Include/internal/pycore_object.h:343 [inlined]
2025-05-18T18:16:37.0266702Z visit_decref at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:465
2025-05-18T18:16:37.0537116Z dict_traverse at /home/runner/work/_temp/SourceCode/Objects/dictobject.c:3549
2025-05-18T18:16:37.0559335Z subtract_refs at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:491 [inlined]
2025-05-18T18:16:37.0560272Z deduce_unreachable at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1116
2025-05-18T18:16:37.0581180Z gc_collect_main at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1242
2025-05-18T18:16:37.0603174Z gc_collect_with_callback at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1426
2025-05-18T18:16:37.0625022Z gc_collect_generations at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:1481 [inlined]
2025-05-18T18:16:37.0625642Z _Py_RunGC at /home/runner/work/_temp/SourceCode/Modules/gcmodule.c:2318
2025-05-18T18:16:37.0652102Z _Py_HandlePending at /home/runner/work/_temp/SourceCode/Python/ceval_gil.c:1045

Others may have more insights but there's a good chance this is a bug in some extension module you're using, not in Python itself.

MilesCranmer

MilesCranmer commented on May 18, 2025

@MilesCranmer
Author

I don't doubt it, but I don't know where. Is it from the CPython API of juliacall, numpy, sklearn, or pandas, or is it some interaction between permutations of these. I truly don't know. Any tips in getting more info out would be much appreciated. Or if you have any insight based on the stacktrace itself.

MilesCranmer

MilesCranmer commented on May 18, 2025

@MilesCranmer
Author

Potentially related to my similar issue here: #113591. There are two differences:

  1. that issue was a EXCEPTION_ACCESS_VIOLATION rather than a segmentation fault, and
  2. that one had a Julia-side GC stacktrace rather than Python-side GC stacktrace as shown here.

The similarities are:

  1. This seems to be a GC-related hard crash.
  2. This only started showing up on Python 3.12, but not in earlier versions.
picnixz

picnixz commented on May 18, 2025

@picnixz
Member

Access violations on Windows may translate into "equivalent" SIGSEV on Linux so this could be the reason.
Concerning the Julia stacktrace, it could be because of how the stack trace is actually printed, and may depend on the OS (maybe Linux is able to recover more than on Windows, or vice-versa).

added
pendingThe issue will be closed if no feedback is provided
on May 18, 2025
MilesCranmer

MilesCranmer commented on May 18, 2025

@MilesCranmer
Author

Thanks, that is interesting. Indeed perhaps they have a similar root cause.

One other clarification is that #113591 was observed in 2023 when PySR used pyjulia as its interface, whereas the error reported in this thread was observed in the present day, when PySR runs on juliacall. pyjulia and juliacall are entirely different packages that interface Python and Julia. They have separate codebases. But in both cases, the GC-related crash started appearing on Python 3.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedtype-crashA hard crash of the interpreter, possibly with a core dump

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Random segfaults on Python 3.12.10 during CI testing · Issue #134193 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant