Closed
Description
Bug report
Bug description:
Python 3.13.3 (main, Apr 9 2025, 07:44:25) [GCC 14.2.1 20250207]
import difflib
s1='TZiSLutKO5xRiAkkw1ZGkpZsq4'
s2='hTkmKeyY0WYoEqn7xD6jDRwRU4quqozyh8WFwkYY82h9wVv93iUzijw4Q8JYh4l496RD20dsTmy1T0Tl5D1sLRetaW2PP75f9fLeSCllRmISdDFLb3QazkubtOAjZ95a5Ril7NdVIX8hJWlJgwmhd7FGlO5aQQQbLeQcSEFqmiDZOnBWoAisj9YeKHiihm2QzAsdZAN78CO8tXEHfKjCOoZWQo513tEJ4b26BRItK'
m = difflib.SequenceMatcher(None, s1, s2).find_longest_match(ahi=len(s1),bhi=len(s2))
assert s1.find('tK') == 6
assert len(s2) == 233
assert s2.find('tK') == 231
assert m == difflib.Match(3,100,1)
CPython versions tested on:
3.13
Operating systems tested on:
Linux
Activity
picnixz commentedon May 16, 2025
What's your expected result and what's the result that is found?
unisgn commentedon May 16, 2025
since 'tK' in both s1 and s2, i think at least we expect:
assert m.size >= len('tK')
ZeroIntensity commentedon May 16, 2025
Oh how I love fuzzer-generated repros
tim-one commentedon May 16, 2025
Pass
autojunk=False
to theSequenceMatcher
constructor. Then you'll getMatch(a=6, b=231, size=2)
as the result. As is,Those characters show up so often that, by default (
autojunk=True
), the matcher considers them to be noise characters, and saves enormous amounts of time by not looking for matches containing them.'t'
is in that set.Turn off
autojunk
to make it look at every possibility. Less surprising that way, but may take a lot longer.unisgn commentedon May 16, 2025