Skip to content

DBM Module Vacuuming #134004

Closed
Closed
@Andrea-Oliveri

Description

@Andrea-Oliveri

Feature or enhancement

Proposal:

Good afternoon,
The dbm module, and by extension shelve as well, don't provide any way to reclaim free space when lots of deletions from the database happen. This applies to all of the dbm submodules (dbm.dumb, dbm.sqlite, dbm.ndbm, dbm.gnu).

This can lead to hundreds of GB of wasted space when using them to store complex objects, such as when using them as a persistent cache.

Most of the underlying libraries, however, support ways to retrieve space on-demand:

  • VACUUM in sqlite3
  • gdbm_reorganize for gnu
  • None for ndbm
  • None for dumb (but this is simple to implement and I would be happy to contribute: in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many dbm.dumb operations already)

Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb. For now they are only comments in the source code and are hidden from developers reading the doc:

  • Lack of support for any concurrency
  • Slowness linearly proportional to index size
  • (This will hopefully be fixed by the PR so it won't be included but otherwise also) never retrieves space of deleted items.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/dbm-module-add-vacuuming/91507

Linked PRs

Activity

serhiy-storchaka

serhiy-storchaka commented on May 15, 2025

@serhiy-storchaka
Member

Why not call it reorganize(), for compatibility with gdbm?

Andrea-Oliveri

Andrea-Oliveri commented on May 15, 2025

@Andrea-Oliveri
ContributorAuthor

Thank you @serhiy-storchaka, perfectly valid question whose logic I can't confute. I have updated the pull request changing the method names accordingly 😅

Andrea-Oliveri

Andrea-Oliveri commented on May 27, 2025

@Andrea-Oliveri
ContributorAuthor

Good afternoon,
gentle ping for updates on this issue.
I think @erlend-aasland wanted to have a look at this too.
Thank you 😃.

added a commit that references this issue on Jun 1, 2025

gh-134004: Added the reorganize() methods to dbm.sqlite, dbm.dumb and…

f806463
linked a pull request that will close this issuegh-134004: Dbm vacuuming #134028on Jun 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      DBM Module Vacuuming · Issue #134004 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant