Skip to content

Directory listing in SimpleHTTPRequestHandler does not work well in non-UTF-8 locale #133889

Closed
@serhiy-storchaka

Description

@serhiy-storchaka

For directory, SimpleHTTPRequestHandler generates an index.html page containing a list of files. It uses the filesystem encoding for the page, which is reasonable, because file names are encoded with that encoding. The problem is that the directory patch, included in the title, can contain a query part of the URL, which may be not encodable with the filesystem encoding.

This causes test failure when running in non-UTF8 locale:

$ LC_ALL=uk_UA ./python -m test -vuall test_httpservers -m test_undecodable_parameter
...
test_undecodable_parameter (test.test_httpservers.SimpleHTTPServerTestCase.test_undecodable_parameter) ... ----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 48062)
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 318, in _handle_request_noblock
    self.process_request(request, client_address)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 349, in process_request
    self.finish_request(request, client_address)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 721, in __init__
    super().__init__(*args, **kwargs)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/socketserver.py", line 766, in __init__
    self.handle()
    ~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 485, in handle
    self.handle_one_request()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 473, in handle_one_request
    method()
    ~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 725, in do_GET
    f = self.send_head()
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 769, in send_head
    return self.list_directory(path)
           ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/serhiy/py/cpython/Lib/http/server.py", line 874, in list_directory
    encoded = '\n'.join(r).encode(enc, 'surrogateescape')
  File "/home/serhiy/py/cpython/Lib/encodings/koi8_u.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 178: character maps to <undefined>
encoding with 'koi8-u' codec failed
----------------------------------------
ERROR

======================================================================
ERROR: test_undecodable_parameter (test.test_httpservers.SimpleHTTPServerTestCase.test_undecodable_parameter)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_httpservers.py", line 559, in test_undecodable_parameter
    response = self.request(self.base_url + '/?x=%bb').read()
               ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/serhiy/py/cpython/Lib/test/test_httpservers.py", line 131, in request
    return self.connection.getresponse()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 1430, in getresponse
    response.begin()
    ~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ~~~~~~~~~~~~~~~~~^^
  File "/home/serhiy/py/cpython/Lib/http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
                             " response")
http.client.RemoteDisconnected: Remote end closed connection without response

----------------------------------------------------------------------

I suspect that there may also be issues if some files in the directory have non-decodable or the path of the directory is non-decodable, but I have not tested this yet.

Linked PRs

Activity

added
3.13bugs and security fixes
3.14bugs and security fixes
3.15new features, bugs and security fixes
on May 11, 2025
StanFromIreland

StanFromIreland commented on May 13, 2025

@StanFromIreland
Contributor

There is no good way that guarantees it will work with simple locale, so why not encode in utf-8 instead? It is becoming default in Python anyway, and I believe is the default in the majority of web browsers. We could make it optional to use system encoding, and default to the web standard?

added a commit that references this issue on May 16, 2025

pythongh-133889: Improve tests for SimpleHTTPRequestHandler

added a commit that references this issue on May 17, 2025

gh-133889: Improve tests for SimpleHTTPRequestHandler (GH-134102)

fcaf009
added a commit that references this issue on May 17, 2025

pythongh-133889: Improve tests for SimpleHTTPRequestHandler (pythonGH…

17 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesstdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Directory listing in SimpleHTTPRequestHandler does not work well in non-UTF-8 locale · Issue #133889 · python/cpython

    Follow Lee on X/Twitter - Father, Husband, Serial builder creating AI, crypto, games & web tools. We are friends :) AI Will Come To Life!

    Check out: eBank.nz (Art Generator) | Netwrck.com (AI Tools) | Text-Generator.io (AI API) | BitBank.nz (Crypto AI) | ReadingTime (Kids Reading) | RewordGame | BigMultiplayerChess | WebFiddle | How.nz | Helix AI Assistant