Description
Bug report
Bug description:
Hello. While working on a small joke program, I found a possible memory corruption issue (it could also be a threading issue?) when using the Python C API in a debug only build to quickly create, execute python code, and then destroy 463 sub interpreters. Before I post the code sample and the debug output I'm using a somewhat unique build environment for a Windows developer.
- clang++ 18.1.7 for x86_64-pc-windows-msvc
- Visual Studio Build Tools 2022 17.11.0
- CMake 3.30
- Python 3.12.5
- ninja 1.12.1
When running the code sample I've attached at the bottom of this post, I am unable to get the exact same output each time, though the traceback does fire in the same location (Due to the size of the traceback I've not attached it, as it's about 10 MB of text for each thread). Additionally, I sometimes have to run the executable several times to get the error to occur. Lastly, release builds do not exhibit any thread crashes or issues as the debug assertions never fire or execute.
The error output seems to also halt in some cases, either because of stack failure or some other issue I was unable to determine and seemed to possibly be outside the scope of the issue presented here. I have the entire error output from one execution where I was able to save the output.
Debug memory block at address p=00000267272D6030: API 'p'
16718206241729413120 bytes originally requested
The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
at p-7: 0xa0 *** OUCH
at p-6: 0x00 *** OUCH
at p-5: 0x00Hello, world! From Thread 31
Hello, world! From Thread 87
*** OUCH
at p-4: 0xfd
Hello, world! From Thread 43
Hello, world! From Thread 279
Generating thread state 314
at p-3: 0xfd
at p-2: 0xfd
Hello, world! From Thread 168
Generating thread state 315
at p-1: 0xfd
Because memory is corrupted at the start, the count of bytes requested
may be bogus, and checking the trailing pad bytes may segfault.
Generating thread state 316
Generating thread state 317
The 8 pad bytes at tail=E8030267272D6030 are
The output cut off after this, as the entire program crashed, taking my terminal with it 😅
You'll find the MRE code below. I've also added a minimal version of CMakeLists.txt file I used so anyone can recreate the build with the code below (Any warnings, or additional settings I have do not affect whether the error occurs or not). The code appears to breaks inside of _PyObject_DebugDumpAddress
, based on what debugging I was able to do with WinDbg.
Important
std::jthread
calls .join()
on destruction, so all threads auto-join once the std::vector
goes out of scope.
Additionally this code exhibits the same behavior regardless of whether it is a thread_local
or declared within the lambda passed to std::thread
main.cxx
#include <vector>
#include <thread>
#include <cstdlib>
#include <print>
#include <Python.h>
namespace {
static thread_local inline PyThreadState* state = nullptr;
static inline constexpr auto MAX_STATES = 463;
static inline constexpr auto config = PyInterpreterConfig {
.use_main_obmalloc = 0,
.allow_fork = 0,
.allow_exec = 0,
.allow_threads = 0,
.allow_daemon_threads = 0,
.check_multi_interp_extensions = 1,
.gil = PyInterpreterConfig_OWN_GIL,
};
} /* nameless namespace */
void execute () {
std::vector<std::jthread> tasks { };
tasks.reserve(MAX_STATES);
for (auto count = 0zu; count < tasks.capacity(); count++) {
std::println("Generating thread state {}", count);
tasks.emplace_back([count] {
if (auto status = Py_NewInterpreterFromConfig(&state, &config); PyStatus_IsError(status)) {
std::println("Failed to initialize thread state {}", count);
return;
}
auto text = std::format(R"(print("Hello, world! From Thread {}"))", count);
auto globals = PyDict_New();
auto code = Py_CompileString(text.data(), __FILE__, Py_eval_input);
auto result = PyEval_EvalCode(code, globals, globals);
Py_DecRef(result);
Py_DecRef(code);
Py_DecRef(globals);
Py_EndInterpreter(state);
state = nullptr;
});
}
}
int main() {
PyConfig config {};
PyConfig_InitIsolatedConfig(&config);
if (auto status = Py_InitializeFromConfig(&config); PyStatus_IsError(status)) {
std::println("Failed to initialize with isolated config: {}", status.err_msg);
return EXIT_FAILURE;
}
PyConfig_Clear(&config);
execute();
Py_Finalize();
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.30)
project(463-interpreters LANGUAGES C CXX)
find_package(Python 3.12 REQUIRED COMPONENTS Development.Embed)
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")
add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} PRIVATE main.cxx)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_23)
target_precompile_headers(${PROJECT_NAME} PRIVATE <Python.h>)
target_link_libraries(${PROJECT_NAME} PRIVATE Python::Python)
Command to build + run
$ cmake -Bbuild -S. -G "Ninja"
$ cmake --build build && .\build\463-interpreters.exe
CPython versions tested on:
3.12
Operating systems tested on:
Windows
Metadata
Metadata
Assignees
Projects
Status
Activity
picnixz commentedon Aug 19, 2024
Can you reproduce this behaviour on Linux? or MacOS? Or what happens if you sleep a bit before spawning the sub-interpreters? I'm not an expert in threads but I'd say that interpreters are spawning too fast for the OS to follow up =/ Can you check whether the same occurs with HEAD instead of 3.12.5? (maybe there are some bugfixes that are not yet available)
ZeroIntensity commentedon Aug 19, 2024
At a glance, this could possibly be related to a memory allocation failure or something similar.
These lines need a
NULL
check if an error occurred, otherwise there will be aNULL
pointer dereference in the call toPy_DecRef
.bruxisma commentedon Aug 19, 2024
bruxisma commentedon Aug 19, 2024
Sleeping might fix it, but might also hide the issue. The goal here was just to run a bunch of interpreters, not sleep for the sake of working around possible synchronization issues
This is not how threads or operating systems work. There is a synchronization issue or an ABA problem with the debug memory assertions in the C API's internals
I will also work on giving this a go.
ZeroIntensity commentedon Aug 19, 2024
Oh! TIL. I still would suggest adding a
NULL
check -- an error indicator could be set, which would break subsequent calls toPyErr_Occurred
.bruxisma commentedon Aug 19, 2024
The error is outside of the Python C API as one might usually interact with it. i.e., only on subinterpreter initialization (i.e., creating a
PyThreadState
) or on subinterpreter destruction (Py_EndInterpreter
) does the error occur (typically the latter).PyErr_Occurred
is not being called anywhere in the MRE, nor in the actual traceback for the failure.ZeroIntensity commentedon Aug 19, 2024
Right, but it could cause memory corruption somewhere, and cause the error somewhere else down the line. Regardless, I wasn't able to reproduce this on Linux, so this might be Windows-specific.
bruxisma commentedon Aug 20, 2024
@picnixz I can confirm the behavior still exists on Windows on
HEAD
. Same errors can occurPyErr_Occurred
only performs a read on the current thread state object. If an error is present when it shouldn't be that still points to a memory corruption error occurring between the destruction and initialization of aPyThreadState
.bruxisma commentedon Aug 20, 2024
TBBle commentedon Aug 20, 2024
A quick look suggests this TSan failure is a local data-race to posixmodule.c: There appears to be no protection from multiple threads importing posixmodule simultaneously, and hence multiple threads trying to simultaneously qsort in-place one of the three static tables.
This module is used on Windows, but I'm not sure if any of the macros enabling those
qsort
calls (HAVE_FPATHCONF
,HAVE_PATHCONF
,HAVE_CONFSTR
, orHAVE_SYSCONF
) are enabled in the Windows build. My guess would be no...I don't know if overlapping qsorts on the same array will actually corrupt memory, but I doubt there's any guarantee it won't. (
qsort_r
doesn't help, that makes the comparison function thread-safe, not the array being sorted, according to the manpage.)So probably a separate issue, and a more-focused/reliable MRE could be made for it.
87 remaining items