Description
Bug report
Bug description:
test_sys fails while running the test suite of python 3.15.0 when --with-pydebug and --enable-optimizations are both enabled.
I get consistent failure of the test suite, but when I run the test individually, it passes. I do not know if this is expected or not, as I understand that enabling the debugger and optimizations is kind of contradictory. I included the test header, as well as the tests summary for additional info.
== CPython 3.15.0a0 (heads/main-dirty:9983c7d4416, May 19 2025, 09:41:00) [GCC 13.3.0]
== Linux-6.11.0-25-generic-x86_64-with-glibc2.39 little-endian
== Python build: debug PGO
== cwd: /home/badger/oss/cpython/build/test_python_worker_177894æ
== CPU count: 8
== encodings: locale=UTF-8 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests
Using random seed: 3038718328
...
0:32:13 load avg: 1.39 [396/491] test_sys
test test_sys failed -- Traceback (most recent call last):
File "/home/badger/oss/cpython/Lib/test/test_sys.py", line 1156, in test_getallocatedblocks
self.assertLess(a, sys.gettotalrefcount())
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 698068 not less than 663517
0:32:18 load avg: 1.36 [396/491/1] test_sys failed (1 failure)
...
== Tests result: FAILURE ==
22 tests skipped:
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test.test_gdb.test_backtrace
test.test_gdb.test_cfunction test.test_gdb.test_cfunction_full
test.test_gdb.test_misc test.test_gdb.test_pretty_print
test_android test_apple test_dbm_gnu test_dbm_ndbm test_devpoll
test_free_threading test_kqueue test_launcher test_msvcrt
test_startfile test_winapi test_winconsoleio test_winreg test_wmi
test_zstd
11 tests skipped (resource denied):
test_curses test_peg_generator test_pyrepl test_smtpnet
test_socketserver test_tkinter test_ttk test_urllib2net
test_urllibnet test_winsound test_zipfile64
1 test failed:
test_sys
457 tests OK.
Total duration: 39 min 17 sec
Total tests: run=46,135 failures=1 skipped=2,164
Total test files: run=480/491 failed=1 skipped=22 resource_denied=11
Result: FAILURE
I don't suspect my local environment to be the problem, since the test suite passes when I remove the "--enable-optimizations" flag, and just enable the debugger. The variable "a" seems to change each time I run the suite, so I included the random seed.
Build setup:
Tested on Python 3.15.0.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.2 LTS
Release: 24.04
Codename: noble
CPU: product: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
Configuration I used
./configure --with-pydebug --enable-optimizations
CPython versions tested on:
3.15, CPython main branch
Operating systems tested on:
Linux
Activity
Fearless-Badger commentedon May 19, 2025
After investigating further, I have found that when running the tests with
./python -m test -j3
the test suite will pass. It is only when I run the test suite with./python -m test
that "test_sys" will fail.Basically, "test_sys.test_getallocatedblocks() " only fails when run sequentially, and as part of the full suite.
[-]Python 3.15.0: test_sys fails with both "--with-pydebug" and "--enable-optimizations"[/-][+]Python 3.15.0: test_sys.test_getallocatedblocks() fails with both "--with-pydebug" and "--enable-optimizations"[/+]tpburns commentedon May 19, 2025
I had experienced the same error with a slightly different configuration, and have been experimenting to reproduce and maybe isolate. I have now observed the bug with a configure using only the
--with-pydebug
flag, though similarly only in the full test suite. See below (I cleaned up the outputs a bit)and
This was on Ubuntu 24.04.2 LTS, Core™ i7-1195G7 × 8 processor, Python 3.15.0a0 (main),
vstinner commentedon May 19, 2025
I can reproduce the issue with:
[-]Python 3.15.0: test_sys.test_getallocatedblocks() fails with both "--with-pydebug" and "--enable-optimizations"[/-][+]Python 3.15.0: test_sys.test_getallocatedblocks() fails when tests are run sequentially[/+]vstinner commentedon May 19, 2025
The test_sys.test_getallocatedblocks() failure is related to test_collections.test_odd_sizes():
vstinner commentedon May 19, 2025
According to git bisect, it started to fail at commit 2498c22.
vstinner commentedon May 19, 2025
The following change introduced the regression:
[-]Python 3.15.0: test_sys.test_getallocatedblocks() fails when tests are run sequentially[/-][+]Python 3.15.0: test_sys.test_getallocatedblocks() fails if run after test_collections.test_odd_sizes()[/+]pythongh-134248 test_getallocatedblocks pre-check to ignore immortali…
tpburns commentedon May 28, 2025
Another bisect showed that it was 6f1d448 that first had the dependency of a large n in
test_odd_sizes
(nowtest_large_size
after a recent refactor) causes the assert to fail intest_getallocatedblocks
. Here's the story as far as I can tell:The
namedtuples
__new__
method is implemented by passing a lambda function toeval
with the input string including all of the names. During parsing each name is interned.When immortalization was implemented in ea2c001, all interned strings were immortalized, but two references were kept for inclusion in the total reference count.
cpython/Objects/unicodeobject.c
Lines 14638 to 14644 in ea2c001
cpython/Objects/unicodeobject.c
Lines 15272 to 15277 in 6f1d448
namedtuple
.Therefore, the large n number of names input to the
namedtuple
are immortalized andgetallocatedblocks
will include those blocks, but the current implementation doesn't show any references for them ingettotalrefcount
.I'm arranging a PR which brings the check in test_getallocatedblocks more in line with its original intent by removing the size of the blocks used on immortalized strings. This same technique is already used by refleak.py (see #122420) so it seems appropriate here as well.
There are other ways this could be separately or simultaneously be addressed. I want to specifically link the discussion in #130384 to adjust how
sys_getallocatedblocks_impl
works. It's my understanding that immortalization is per interpreter unless ThreadState is explicitly shared, so such a change would open up the possibility of isolating either of the two tests in its own interpreter, giving a more thorough decoupling of these tests.I'd very much welcome insight from anyone who worked on these tests or immortalization previously!
Edited to add a correction to (3), the linked code above is accounting for all of the references prior to interning, but in
intern_common
is where the two associated with the dict are handled both in the total and for the count on the string object:cpython/Objects/unicodeobject.c
Lines 15518 to 15527 in 046670c
The call to
immortalize_interned
(where the loop is) happens afterwards, which is why its accounting changed in that commit, but the behavior difference was introduced by the two decrements inintern_common
.9 remaining items