Description
Feature or enhancement
Proposal:
Good afternoon,
The dbm
module, and by extension shelve
as well, don't provide any way to reclaim free space when lots of deletions from the database happen. This applies to all of the dbm
submodules (dbm.dumb
, dbm.sqlite
, dbm.ndbm
, dbm.gnu
).
This can lead to hundreds of GB of wasted space when using them to store complex objects, such as when using them as a persistent cache.
Most of the underlying libraries, however, support ways to retrieve space on-demand:
VACUUM
in sqlite3gdbm_reorganize
for gnu- None for ndbm
- None for dumb (but this is simple to implement and I would be happy to contribute: in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many
dbm.dumb
operations already)
Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb. For now they are only comments in the source code and are hidden from developers reading the doc:
- Lack of support for any concurrency
- Slowness linearly proportional to index size
- (This will hopefully be fixed by the PR so it won't be included but otherwise also) never retrieves space of deleted items.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/dbm-module-add-vacuuming/91507
Activity
serhiy-storchaka commentedon May 15, 2025
Why not call it
reorganize()
, for compatibility withgdbm
?Andrea-Oliveri commentedon May 15, 2025
Thank you @serhiy-storchaka, perfectly valid question whose logic I can't confute. I have updated the pull request changing the method names accordingly 😅
Andrea-Oliveri commentedon May 27, 2025
Good afternoon,
gentle ping for updates on this issue.
I think @erlend-aasland wanted to have a look at this too.
Thank you 😃.
gh-134004: Added the reorganize() methods to dbm.sqlite, dbm.dumb and…