Open
Description
Bug report
Bug description:
I'm seeing many emails with headers broken across multiple lines. When this happens to Message-Id, the parsed header value will include a leading space:
eml = dedent("""\
From: "Foo Bar" <foo@bar.com>
To: Baz Baq <baz@baq.com>
Date: Thu, 22 May 2025 16:15:44 +0000
Message-ID:
<BY5PR09MB5490557EF2B28EC8E707108FBF99A@BY5PR09MB5490.namprd09.prod.outlook.com>
Content-Type: text/plain
Hello
""")
msg = Parser(policy=default_policy).parsestr(eml)
print(f"|{msg['Message-ID']}|") # prints | <BY5PR09MB5490557EF2B28EC8E707108FBF99A@BY5PR09MB5490.namprd09.prod.outlook.com>|
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Activity
Peopl3s commentedon May 28, 2025
I think this is expected behavior according to the email message format standards (RFC 5322). When headers are folded across multiple lines (a technique called "folding" used to keep lines under 78 characters), the continuation lines start with whitespace.
The email parser preserves all the original formatting, including folding whitespace, because it's designed to be a faithful representation of the original message. It's up to the application to clean up the values if needed.
jakajancar commentedon May 28, 2025
Could the
MessageIDHeader
have a property, similar to .addresses, that returns the cleaned up value? (whatever that is in this case, not sure if just .strip() or more)StanFromIreland commentedon May 28, 2025
Per RFC 5322 the current behavior is correct:
From my understanding of the above quote this should be closed.
medmunds commentedon May 28, 2025
This is a duplicate of gh-124452, and has been fixed in Python 3.12.8 and 3.13.1. This issue can be closed as a duplicate.
That's not correct, at least not with a modern policy like email.policy.default. The whole point of Python's email package is that it understands RFC 5322 (and other complicated email specs) so the caller doesn't need to.
It's true that the parser's internal representation does preserve the original formatting, so that re-serializing it with an email.generator (usually) results in the original message. But the exposed API on the parsed message (e.g.,
msg["Message-ID"]
above) is intended to provide values that are fully unfolded, decoded, and otherwise cleaned-up. (And in fact, when parsing with a modern policy, there is currently no public API to access the original, raw header values.)jakajancar commentedon May 29, 2025
Awesome, thank you! I believe we can close this then.