Skip to content

MessageIDHeader does not trim whitespace #134812

Open
@jakajancar

Description

@jakajancar

Bug report

Bug description:

I'm seeing many emails with headers broken across multiple lines. When this happens to Message-Id, the parsed header value will include a leading space:

eml = dedent("""\
    From: "Foo Bar" <foo@bar.com>
    To: Baz Baq <baz@baq.com>
    Date: Thu, 22 May 2025 16:15:44 +0000
    Message-ID:
     <BY5PR09MB5490557EF2B28EC8E707108FBF99A@BY5PR09MB5490.namprd09.prod.outlook.com>
    Content-Type: text/plain

    Hello
""")
msg = Parser(policy=default_policy).parsestr(eml)
print(f"|{msg['Message-ID']}|") # prints | <BY5PR09MB5490557EF2B28EC8E707108FBF99A@BY5PR09MB5490.namprd09.prod.outlook.com>|

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Activity

Peopl3s

Peopl3s commented on May 28, 2025

@Peopl3s

I think this is expected behavior according to the email message format standards (RFC 5322). When headers are folded across multiple lines (a technique called "folding" used to keep lines under 78 characters), the continuation lines start with whitespace.

The email parser preserves all the original formatting, including folding whitespace, because it's designed to be a faithful representation of the original message. It's up to the application to clean up the values if needed.

jakajancar

jakajancar commented on May 28, 2025

@jakajancar
Author

Could the MessageIDHeader have a property, similar to .addresses, that returns the cleaned up value? (whatever that is in this case, not sure if just .strip() or more)

StanFromIreland

StanFromIreland commented on May 28, 2025

@StanFromIreland
Member

Per RFC 5322 the current behavior is correct:

Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP.

From my understanding of the above quote this should be closed.

medmunds

medmunds commented on May 28, 2025

@medmunds
Contributor

This is a duplicate of gh-124452, and has been fixed in Python 3.12.8 and 3.13.1. This issue can be closed as a duplicate.

The email parser preserves all the original formatting, including folding whitespace, because it's designed to be a faithful representation of the original message. It's up to the application to clean up the values if needed.

That's not correct, at least not with a modern policy like email.policy.default. The whole point of Python's email package is that it understands RFC 5322 (and other complicated email specs) so the caller doesn't need to.

It's true that the parser's internal representation does preserve the original formatting, so that re-serializing it with an email.generator (usually) results in the original message. But the exposed API on the parsed message (e.g., msg["Message-ID"] above) is intended to provide values that are fully unfolded, decoded, and otherwise cleaned-up. (And in fact, when parsing with a modern policy, there is currently no public API to access the original, raw header values.)

jakajancar

jakajancar commented on May 29, 2025

@jakajancar
Author

Awesome, thank you! I believe we can close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      `MessageIDHeader` does not trim whitespace · Issue #134812 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant