-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-72680: Fix false positives when using zipfile.is_zipfile() #134250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles
🤖 New build scheduled with the buildbot fleet by 4bfec3e 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134250%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Improve Zip file validation in :func:`zipfile.is_zipfile`. | ||
|
||
Before this change :func:`zipfile.is_zipfile` only checked the End Central Directory | ||
signature. If the signature could be found in the last 64k of the file, | ||
success! This produced false positives on any file with ``'PK\x05\x06'`` in the | ||
last 64k of the file - including PDFs and PNGs. | ||
|
||
This is now corrected by actually validating the Central Directory location | ||
and size based on the information provided by the End of Central Directory | ||
along with verifying the Central Directory signature of the first entry. | ||
|
||
This should be sufficient for the vast number of Zip files with fewer | ||
false positives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very beefy NEWS, could it be condensed. We generally keep them quite brief, details go in issues or docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move most of the NEWS entry into the commit message, and keep the NEWS entry short.
Thanks @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile (cherry picked from commit 1298511) Co-authored-by: Tim Hatch <timhatch@netflix.com> Co-authored-by: John Jolly <john.jolly@gmail.com>
#134401) GH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile (cherry picked from commit 1298511) Co-authored-by: Tim Hatch <timhatch@netflix.com> Co-authored-by: John Jolly <john.jolly@gmail.com>
…ythonGH-134250) bpo-28494: Improve zipfile.is_zipfile reliability The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles * Reuse 'concat' handling for is_zipfile Co-authored-by: John Jolly <john.jolly@gmail.com>
Rebased @jjolly
Fix zipfile validation issue by ... providing more validation!
Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.
This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.
This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.