Description
Bug report
(editor's note: the description below indicates that realpath()
is unaffected, but that's not correct - see comment)
If a file descriptor is created in /proc/self/fd/
and then passed to pathlib.Path
, the resolve()
method produces a result which is no longer a real and valid path to the file.
MRE, including os.path.realpath
for comparison:
import os, pathlib
fd = os.memfd_create("myfd")
print(pathlib.Path(f"/proc/self/fd/{fd}").resolve())
# /memfd:myfd (deleted)
print(os.path.realpath("/proc/self/fd/4"))
# /proc/227671/fd/4
This can occur easily if another program uses the /proc/self/fd/
mechanism to pass temporary files to a python application.
For example, using ZSH on linux:
$ python -c 'import pathlib, sys; print(pathlib.Path(sys.argv[1]).resolve())' <(echo 'hi')
/proc/234888/fd/pipe:[331484707]
I looked at pathlib.py
and in 3.11 it looks like it's primarily relying on realpath
, so I'm not clear on where or how the discrepancy gets introduced. I'm sure I don't understand something about pathlib
which explains.
Your environment
- CPython versions tested on: 3.10, 3.11
- Operating system and architecture: Linux
Activity
eryksun commentedon Nov 11, 2022
The proc filesystem on Linux supports symlinks to anonymous files. These symlinks can be traversed even though the target path that's returned by
readlink()
doesn't exist, such as "pipe:[331484707]" or, for a deleted file, "/path/to/filename (deleted)".You can try a workaround based on passing
strict=True
. If resolving strictly fails butstat()
succeeds, resolve the parent path strictly and join the unresolved symlink to the result. Of course, it isn't strictly a 'real' path, but that's not possible in this case.sirosen commentedon Nov 12, 2022
Thanks for answering, and for suggesting workarounds for this scenario!
For now, I've put in place a regex match for
/proc/(self|\d+)/fd/\d+
since I'm not aware of other cases, but I'll also give thought to thestrict=True
usage. (My regex match doesn't, for example, handle a symlink to one of these files.)This makes me see this not as a bug in pathlib, but it's simultaneously not what I expected as a user.
The basic form of usage is wanting to both resolve the path and also read a file. Something like
That works in most scenarios.
It's pretty surprising that
open(path)
could fail in that example whereopen(filename)
would have worked.Is there any possible enhancement here to pathlib? I would expect most users of
resolve()
are treating it as a way of getting a fully normalized path to a file which can be opened and used. Any other usage is likely more niche. At the same time, I can't think of a backwards compatible way of introducing my desired behavior other than to add a new method or option.barneygale commentedon Feb 10, 2024
I think there's a typo in the repro code - the
fd
value should be included in therealpath()
call:With the corrected code I've found
resolve()
andrealpath()
to be consistent from 3.8 through 3.13.If there's a bug here, it's in
realpath()
and not pathlib, so I'm removing thetopic-pathlib
label and re-titling the bug.[-]pathlib.Path.resolve produces unexpected results when run on a path created using the /proc/self/fd/ mechanism[/-][+]os.path.realpath() produces unexpected results when run on a path created using the /proc/self/fd/ mechanism[/+]sirosen commentedon Feb 15, 2024
Yep, that's a good catch! The MRE is wrong. I can't say offhand how that got mixed up on my end but please pay it no mind.
Even having learned more from this thread, I still think this is a bit of a "gotcha" about
realpath()
. But it's not clear what kind of enhancement (behavior-wise or docs) would help.In the simplest, most naive terms, what I'd like is a variant of
realpath()
which follows symlinks, but will stop short of resolving a usable path to an unusable one (like these/proc/
items); perhaps defined by stat working on the path.os.path.realpath2()
oros.path.realpath(..., do_the_thing_from_issue_99390=True)
seem like unacceptable solutions for that even if it were easy to implement.I'm coming around to the thought -- given that I was mistaken about
os.path.realpath()
being different frompathlib.Path.resolve()
-- that this should be closed with no action. I'll refrain from closing right away, but I don't want to take up people's time with a non-actionable enhancement request.