Skip to content

Document base85 and Ascii85 in the base64 module #134201

Closed
@macdjord

Description

@macdjord

In the base64 standard library module, the functions relating to base64, base32, and base16 encoding and decoding are well documented: the documentation links to RFC 4648, which lays out those formats in the usual exhaustive detail of a formal standard, and where our functions deviate from those standards or accept arguments that might cause them to deviate, these are clearly labelled and the deviations explained. The Z85 functions similarly link to a formal specification, which, while less exhaustive, is still perfectly clear.

For the base85 and Ascii85 functions, however, there is no such clarity. The documentation simply describes them as "de-facto standards", but makes no effort to explain them nor link to any source which does so. Nor is this a case where the standard can be considered common knowledge; the first sentence of the Wikipedia page starts with "Ascii85, also called Base85, is a form of binary-to-text encoding...", implying the two are one-and-the-same, yet the module contains separate functions for them, so clearly they must do different things, but which one implements the encoding described there? Indeed, googling the two terms will provide various slightly different descriptions of the encoding, and there is currently no way to definitively identify which variant the b85 and a85 functions actually implement short of reading the source code.

Ideally, the documentation would link to standards which fully specify these encodings. Unfortunately, the Ascii85 specification is buried in the middle of the Adobe PostScript language reference, while the closest thing to a spec for base85 I can find is RFC-1924, which is a) an April Fool's day joke, and b) covers one very specific use-case (encoding IPv6 addresses) whereas the implemented version is much more general.

Now, fully specifying these encodings, including the relevant math, would be beyond the scope of Python's documentation. However, the documentation can and should be expanded to at least inform the reader of certain basic core details:

  • a85encode:
    • Encodes each 4 arbitrary bytes as 5 printable ASCII characters. However, as a special case, a sequence of 4 null bytes encodes to a single 'z', and, optionally, a sequence of 4 0x20 bytes (ASCII space) encodes to a single 'y'
    • Encoded alphabet is ASCII 33 ('!') through ASCII 117 ('u'), plus 'z' and 'y' as mentioned above; the output may also contain '~' and/or '\n' depending on the wrapcol and adobe arguments
    • If pad is true, the input is padded with null bytes to make its length a multiple of 4; decoding the resulting value will return the input with padding included. If pad is false, the input is still padded - the encoding algorithm only operates on 4 byte words - but the resulting value will be modified in order to indicate the amount of padding required, and decoding it will return the input exactly, with padding omitted. In neither case are any padding characters added to the output (as they would be in base32 or base64).
    • The minimum line length for wrapcol is 2 if adobe is true and 1 otherwise; any smaller value, other than 0, will be treated as that minimum.
    • The newlines added by wrapcol will never break up the "<~" and "~>" framing markers added by adobe
  • a85decode:
    • If abobe is true, then the input may contain the leading "<~" framing marker, which will be removed if present before decoding the framed value, but it must contain the trailing "~>" framing marker, which will also be removed; if the trailing marker is absent, ValueError will be raised.
      • The check for the framing markers is done before removing whitespace as specified by ignorechars. Thus there must not be any leading whitespace before "<~" or trailing whitespace after "~>", nor can either marker contain whitespace.
  • b85encode:
    • Encodes each 4 arbitrary bytes as 5 printable ASCII characters.
    • Encoded alphabet is "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~"
    • If pad is true, the input is padded with null bytes to make its length a multiple of 4; decoding the resulting value will return the input with padding included. If pad is false, the input is still padded - the encoding algorithm only operates on 4 byte words - but the resulting value will be modified in order to indicate the amount of padding required, and decoding it will return the input exactly, with padding omitted. In neither case are any padding characters added to the output (as they would be in base32 or base64).

Linked PRs

Activity

biniona

biniona commented on May 19, 2025

@biniona
Contributor

At sprints - I can take a crack at this!

added a commit that references this issue on May 19, 2025

gh-134201: Expand explanation of Base85 encodings in base64 docs (#13…

66aaad6
added a commit that references this issue on May 19, 2025

pythongh-134201: Expand explanation of Base85 encodings in base64 docs (

added 2 commits that reference this issue on May 20, 2025

[3.14] gh-134201: Expand explanation of Base85 encodings in base64 do…

e20f05f

[3.13] gh-134201: Expand explanation of Base85 encodings in base64 do…

edbde92
hugovk

hugovk commented on May 20, 2025

@hugovk
Member

Thanks all! 🎉

3 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Document base85 and Ascii85 in the base64 module · Issue #134201 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant