Skip to content

gh-72680: Fix false positives when using zipfile.is_zipfile() #134250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 21, 2025

Conversation

thatch
Copy link
Contributor

@thatch thatch commented May 19, 2025

Rebased @jjolly

Fix zipfile validation issue by ... providing more validation!

Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.

This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.

This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.

jjolly and others added 2 commits May 19, 2025 10:27
The zipfile.is_zipfile function would only search for the EndOfZipfile
section header. This failed to correctly identify non-zipfiles that
contained this header. Now the zipfile.is_zipfile function verifies
the first central directory entry.

Changes:
* Extended zipfile.is_zipfile to verify zipfile catalog
* Added tests to validate failure of binary non-zipfiles
@thatch thatch force-pushed the is_zipfile_verify branch from 3e51cc4 to c981205 Compare May 19, 2025 17:27
@gpshead gpshead requested a review from jaraco May 19, 2025 18:38
@gpshead gpshead self-assigned this May 19, 2025
@gpshead gpshead added the sprint label May 19, 2025
@gpshead gpshead added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 19, 2025
gpshead
@github-project-automation github-project-automation bot moved this from Todo to In Progress in Sprint 2024 May 19, 2025
@gpshead gpshead added the needs backport to 3.14 bugs and security fixes label May 19, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by 4bfec3e 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134250%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 19, 2025
StanFromIreland
Comment on lines 1 to 13
Improve Zip file validation in :func:`zipfile.is_zipfile`.

Before this change :func:`zipfile.is_zipfile` only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with ``'PK\x05\x06'`` in the
last 64k of the file - including PDFs and PNGs.

This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End of Central Directory
along with verifying the Central Directory signature of the first entry.

This should be sufficient for the vast number of Zip files with fewer
false positives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very beefy NEWS, could it be condensed. We generally keep them quite brief, details go in issues or docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move most of the NEWS entry into the commit message, and keep the NEWS entry short.

@gpshead gpshead removed this from Sprint 2024 May 21, 2025
@gpshead gpshead merged commit 1298511 into python:main May 21, 2025
39 checks passed
@miss-islington-app
Copy link

Thanks @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 21, 2025
…ythonGH-134250)

bpo-28494: Improve zipfile.is_zipfile reliability

The zipfile.is_zipfile function would only search for the EndOfZipfile
section header. This failed to correctly identify non-zipfiles that
contained this header. Now the zipfile.is_zipfile function verifies
the first central directory entry.

Changes:
* Extended zipfile.is_zipfile to verify zipfile catalog
* Added tests to validate failure of binary non-zipfiles
* Reuse 'concat' handling for is_zipfile
(cherry picked from commit 1298511)

Co-authored-by: Tim Hatch <timhatch@netflix.com>
Co-authored-by: John Jolly <john.jolly@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented May 21, 2025

3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 21, 2025
gpshead pushed a commit that referenced this pull request May 21, 2025
#134401)

GH-134250)

bpo-28494: Improve zipfile.is_zipfile reliability

The zipfile.is_zipfile function would only search for the EndOfZipfile
section header. This failed to correctly identify non-zipfiles that
contained this header. Now the zipfile.is_zipfile function verifies
the first central directory entry.

Changes:
* Extended zipfile.is_zipfile to verify zipfile catalog
* Added tests to validate failure of binary non-zipfiles
* Reuse 'concat' handling for is_zipfile
(cherry picked from commit 1298511)

Co-authored-by: Tim Hatch <timhatch@netflix.com>
Co-authored-by: John Jolly <john.jolly@gmail.com>
lkollar pushed a commit to lkollar/cpython that referenced this pull request May 26, 2025
…ythonGH-134250)

bpo-28494: Improve zipfile.is_zipfile reliability

The zipfile.is_zipfile function would only search for the EndOfZipfile
section header. This failed to correctly identify non-zipfiles that
contained this header. Now the zipfile.is_zipfile function verifies
the first central directory entry.

Changes:
* Extended zipfile.is_zipfile to verify zipfile catalog
* Added tests to validate failure of binary non-zipfiles
* Reuse 'concat' handling for is_zipfile

Co-authored-by: John Jolly <john.jolly@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants

Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant