Skip to content

os.path.realpath() produces unexpected results when run on a path created using the /proc/self/fd/ mechanism #99390

Not planned
@sirosen

Description

@sirosen

Bug report

(editor's note: the description below indicates that realpath() is unaffected, but that's not correct - see comment)

If a file descriptor is created in /proc/self/fd/ and then passed to pathlib.Path, the resolve() method produces a result which is no longer a real and valid path to the file.
MRE, including os.path.realpath for comparison:

import os, pathlib
fd = os.memfd_create("myfd")
print(pathlib.Path(f"/proc/self/fd/{fd}").resolve())
# /memfd:myfd (deleted)
print(os.path.realpath("/proc/self/fd/4"))
# /proc/227671/fd/4

This can occur easily if another program uses the /proc/self/fd/ mechanism to pass temporary files to a python application.
For example, using ZSH on linux:

$ python -c 'import pathlib, sys; print(pathlib.Path(sys.argv[1]).resolve())' <(echo 'hi')
/proc/234888/fd/pipe:[331484707]

I looked at pathlib.py and in 3.11 it looks like it's primarily relying on realpath, so I'm not clear on where or how the discrepancy gets introduced. I'm sure I don't understand something about pathlib which explains.

Your environment

  • CPython versions tested on: 3.10, 3.11
  • Operating system and architecture: Linux

Activity

eryksun

eryksun commented on Nov 11, 2022

@eryksun
Contributor

The proc filesystem on Linux supports symlinks to anonymous files. These symlinks can be traversed even though the target path that's returned by readlink() doesn't exist, such as "pipe:[331484707]" or, for a deleted file, "/path/to/filename (deleted)".

You can try a workaround based on passing strict=True. If resolving strictly fails but stat() succeeds, resolve the parent path strictly and join the unresolved symlink to the result. Of course, it isn't strictly a 'real' path, but that's not possible in this case.

sirosen

sirosen commented on Nov 12, 2022

@sirosen
ContributorAuthor

Thanks for answering, and for suggesting workarounds for this scenario!
For now, I've put in place a regex match for /proc/(self|\d+)/fd/\d+ since I'm not aware of other cases, but I'll also give thought to the strict=True usage. (My regex match doesn't, for example, handle a symlink to one of these files.)

This makes me see this not as a bug in pathlib, but it's simultaneously not what I expected as a user.
The basic form of usage is wanting to both resolve the path and also read a file. Something like

path = pathlib.Path(filename).resolve()
print(f"contents of {path}:")
with open(path) as fp:
    print(fp.read())

That works in most scenarios.
It's pretty surprising that open(path) could fail in that example where open(filename) would have worked.

Is there any possible enhancement here to pathlib? I would expect most users of resolve() are treating it as a way of getting a fully normalized path to a file which can be opened and used. Any other usage is likely more niche. At the same time, I can't think of a backwards compatible way of introducing my desired behavior other than to add a new method or option.

barneygale

barneygale commented on Feb 10, 2024

@barneygale
Contributor

I think there's a typo in the repro code - the fd value should be included in the realpath() call:

import os, pathlib
fd = os.memfd_create("myfd")
print(pathlib.Path(f"/proc/self/fd/{fd}").resolve())
# /memfd:myfd (deleted)
print(os.path.realpath(f"/proc/self/fd/{fd}"))
# /memfd:myfd (deleted)

With the corrected code I've found resolve() and realpath() to be consistent from 3.8 through 3.13.

If there's a bug here, it's in realpath() and not pathlib, so I'm removing the topic-pathlib label and re-titling the bug.

changed the title [-]pathlib.Path.resolve produces unexpected results when run on a path created using the /proc/self/fd/ mechanism[/-] [+]os.path.realpath() produces unexpected results when run on a path created using the /proc/self/fd/ mechanism[/+] on Feb 10, 2024
sirosen

sirosen commented on Feb 15, 2024

@sirosen
ContributorAuthor

Yep, that's a good catch! The MRE is wrong. I can't say offhand how that got mixed up on my end but please pay it no mind.

Even having learned more from this thread, I still think this is a bit of a "gotcha" about realpath(). But it's not clear what kind of enhancement (behavior-wise or docs) would help.

In the simplest, most naive terms, what I'd like is a variant of realpath() which follows symlinks, but will stop short of resolving a usable path to an unusable one (like these /proc/ items); perhaps defined by stat working on the path. os.path.realpath2() or os.path.realpath(..., do_the_thing_from_issue_99390=True) seem like unacceptable solutions for that even if it were easy to implement.

I'm coming around to the thought -- given that I was mistaken about os.path.realpath() being different from pathlib.Path.resolve() -- that this should be closed with no action. I'll refrain from closing right away, but I don't want to take up people's time with a non-actionable enhancement request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      os.path.realpath() produces unexpected results when run on a path created using the /proc/self/fd/ mechanism · Issue #99390 · python/cpython

      Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

      Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant