Skip to content

Creating an ungodly amount of sub interpreters in a short amount of time causes memory debug assertions. #123134

Closed
@bruxisma

Description

@bruxisma

Bug report

Bug description:

Hello. While working on a small joke program, I found a possible memory corruption issue (it could also be a threading issue?) when using the Python C API in a debug only build to quickly create, execute python code, and then destroy 463 sub interpreters. Before I post the code sample and the debug output I'm using a somewhat unique build environment for a Windows developer.

  • clang++ 18.1.7 for x86_64-pc-windows-msvc
  • Visual Studio Build Tools 2022 17.11.0
  • CMake 3.30
  • Python 3.12.5
  • ninja 1.12.1

When running the code sample I've attached at the bottom of this post, I am unable to get the exact same output each time, though the traceback does fire in the same location (Due to the size of the traceback I've not attached it, as it's about 10 MB of text for each thread). Additionally, I sometimes have to run the executable several times to get the error to occur. Lastly, release builds do not exhibit any thread crashes or issues as the debug assertions never fire or execute.

The error output seems to also halt in some cases, either because of stack failure or some other issue I was unable to determine and seemed to possibly be outside the scope of the issue presented here. I have the entire error output from one execution where I was able to save the output.

Debug memory block at address p=00000267272D6030: API 'p'
    16718206241729413120 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0xa0 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00Hello, world! From Thread 31
Hello, world! From Thread 87
 *** OUCH
        at p-4: 0xfd
Hello, world! From Thread 43
Hello, world! From Thread 279
Generating thread state 314
        at p-3: 0xfd
        at p-2: 0xfd
Hello, world! From Thread 168
Generating thread state 315
        at p-1: 0xfd
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
Generating thread state 316
Generating thread state 317
    The 8 pad bytes at tail=E8030267272D6030 are

The output cut off after this, as the entire program crashed, taking my terminal with it 😅

You'll find the MRE code below. I've also added a minimal version of CMakeLists.txt file I used so anyone can recreate the build with the code below (Any warnings, or additional settings I have do not affect whether the error occurs or not). The code appears to breaks inside of _PyObject_DebugDumpAddress, based on what debugging I was able to do with WinDbg.

Important

std::jthread calls .join() on destruction, so all threads auto-join once the std::vector goes out of scope.

Additionally this code exhibits the same behavior regardless of whether it is a thread_local or declared within the lambda passed to std::thread

main.cxx

#include <vector>
#include <thread>
#include <cstdlib>
#include <print>

#include <Python.h>

namespace {

static thread_local inline PyThreadState* state = nullptr;
static inline constexpr auto MAX_STATES = 463; 
static inline constexpr auto config = PyInterpreterConfig {
  .use_main_obmalloc = 0,
  .allow_fork = 0,
  .allow_exec = 0,
  .allow_threads = 0,
  .allow_daemon_threads = 0,
  .check_multi_interp_extensions = 1,
  .gil = PyInterpreterConfig_OWN_GIL,
};

} /* nameless namespace */

void execute () {
  std::vector<std::jthread> tasks { };
  tasks.reserve(MAX_STATES);
  for (auto count = 0zu; count < tasks.capacity(); count++) {
    std::println("Generating thread state {}", count);
    tasks.emplace_back([count] {
      if (auto status = Py_NewInterpreterFromConfig(&state, &config); PyStatus_IsError(status)) {
        std::println("Failed to initialize thread state {}", count);
        return;
      }
      auto text = std::format(R"(print("Hello, world! From Thread {}"))", count);
      auto globals = PyDict_New();
      auto code = Py_CompileString(text.data(), __FILE__, Py_eval_input);
      auto result = PyEval_EvalCode(code, globals, globals);
      Py_DecRef(result);
      Py_DecRef(code);
      Py_DecRef(globals);
      Py_EndInterpreter(state);
      state = nullptr;
    });
  }
}

int main() {
  PyConfig config {};
  PyConfig_InitIsolatedConfig(&config);
  if (auto status = Py_InitializeFromConfig(&config); PyStatus_IsError(status)) {
    std::println("Failed to initialize with isolated config: {}", status.err_msg);
    return EXIT_FAILURE;
  }
  PyConfig_Clear(&config);
  execute();
  Py_Finalize();
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.30)
project(463-interpreters LANGUAGES C CXX)

find_package(Python 3.12 REQUIRED COMPONENTS Development.Embed)

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")

add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} PRIVATE main.cxx)
target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_23)
target_precompile_headers(${PROJECT_NAME} PRIVATE <Python.h>)
target_link_libraries(${PROJECT_NAME} PRIVATE Python::Python)

Command to build + run

$ cmake -Bbuild -S. -G "Ninja"
$ cmake --build build && .\build\463-interpreters.exe

CPython versions tested on:

3.12

Operating systems tested on:

Windows

Activity

picnixz

picnixz commented on Aug 19, 2024

@picnixz
Member

Can you reproduce this behaviour on Linux? or MacOS? Or what happens if you sleep a bit before spawning the sub-interpreters? I'm not an expert in threads but I'd say that interpreters are spawning too fast for the OS to follow up =/ Can you check whether the same occurs with HEAD instead of 3.12.5? (maybe there are some bugfixes that are not yet available)

ZeroIntensity

ZeroIntensity commented on Aug 19, 2024

@ZeroIntensity
Member

At a glance, this could possibly be related to a memory allocation failure or something similar.

  auto globals = PyDict_New();
  auto code = Py_CompileString(text.data(), __FILE__, Py_eval_input);
  auto result = PyEval_EvalCode(code, globals, globals);

These lines need a NULL check if an error occurred, otherwise there will be a NULL pointer dereference in the call to Py_DecRef.

bruxisma

bruxisma commented on Aug 19, 2024

@bruxisma
Author
No description provided.
bruxisma

bruxisma commented on Aug 19, 2024

@bruxisma
Author

Can you reproduce this behaviour on Linux? or MacOS?
I will work on getting the Linux/macOS behaviors checked when I can. My WSL2 environment is woefully out of date 😅. macOS might be more difficult, as I'm unaware of how the _d version of the Python.framework might be linked against.

Or what happens if you sleep a bit before spawning the sub-interpreters?

Sleeping might fix it, but might also hide the issue. The goal here was just to run a bunch of interpreters, not sleep for the sake of working around possible synchronization issues

I'm not an expert in threads but I'd say that interpreters are spawning too fast for the OS to follow up =/

This is not how threads or operating systems work. There is a synchronization issue or an ABA problem with the debug memory assertions in the C API's internals

Can you check whether the same occurs with HEAD instead of 3.12.5? (maybe there are some bugfixes that are not yet available)

I will also work on giving this a go.

ZeroIntensity

ZeroIntensity commented on Aug 19, 2024

@ZeroIntensity
Member

Py_DecRef is the function version of Py_XDECREF

Oh! TIL. I still would suggest adding a NULL check -- an error indicator could be set, which would break subsequent calls to PyErr_Occurred.

bruxisma

bruxisma commented on Aug 19, 2024

@bruxisma
Author

The error is outside of the Python C API as one might usually interact with it. i.e., only on subinterpreter initialization (i.e., creating a PyThreadState) or on subinterpreter destruction (Py_EndInterpreter) does the error occur (typically the latter). PyErr_Occurred is not being called anywhere in the MRE, nor in the actual traceback for the failure.

ZeroIntensity

ZeroIntensity commented on Aug 19, 2024

@ZeroIntensity
Member

Right, but it could cause memory corruption somewhere, and cause the error somewhere else down the line. Regardless, I wasn't able to reproduce this on Linux, so this might be Windows-specific.

bruxisma

bruxisma commented on Aug 20, 2024

@bruxisma
Author

@picnixz I can confirm the behavior still exists on Windows on HEAD. Same errors can occur

Right, but it could cause memory corruption somewhere, and cause the error somewhere else down the line. Regardless, I wasn't able to reproduce this on Linux, so this might be Windows-specific.

PyErr_Occurred only performs a read on the current thread state object. If an error is present when it shouldn't be that still points to a memory corruption error occurring between the destruction and initialization of a PyThreadState.

bruxisma

bruxisma commented on Aug 20, 2024

@bruxisma
Author
No description provided.
TBBle

TBBle commented on Aug 20, 2024

@TBBle

However, when executing this code under TSan, I got the following data race error that seems to indicate the same kind of memory corruption COULD be taking place.

A quick look suggests this TSan failure is a local data-race to posixmodule.c: There appears to be no protection from multiple threads importing posixmodule simultaneously, and hence multiple threads trying to simultaneously qsort in-place one of the three static tables.

This module is used on Windows, but I'm not sure if any of the macros enabling those qsort calls (HAVE_FPATHCONF, HAVE_PATHCONF, HAVE_CONFSTR, or HAVE_SYSCONF) are enabled in the Windows build. My guess would be no...

I don't know if overlapping qsorts on the same array will actually corrupt memory, but I doubt there's any guarantee it won't. (qsort_r doesn't help, that makes the comparison function thread-safe, not the array being sorted, according to the manpage.)

So probably a separate issue, and a more-focused/reliable MRE could be made for it.

87 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Creating an ungodly amount of sub interpreters in a short amount of time causes memory debug assertions. · Issue #123134 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant