By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-01-02.18:34:12.724>
labels = ['type-feature', 'library', '3.11']
title = 'add pathlib.Path.walk method'
updated_at = <Date 2022-01-08.12:58:28.255>
user = 'https://github.com/Ovsyanka83'

bugs.python.org fields:

activity = <Date 2022-01-08.12:58:28.255>
actor = 'Ovsyanka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2022-01-02.18:34:12.724>
creator = 'Ovsyanka'
dependencies = []
files = []
hgrepos = []
issue_num = 46227
keywords = ['patch']
message_count = 4.0
messages = ['409511', '409514', '409994', '410098']
nosy_count = 4.0
nosy_names = ['eric.araujo', 'barneygale', 'ajoino', 'Ovsyanka']
pr_nums = ['30340']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46227'
versions = ['Python 3.11']

Pathlib is great, yet every time I have to parse a bunch of files, I have to use os.walk and join paths by hand. That's not a lot of code but I feel like pathlib should have higher-level abstractions for all path-related functionality of os. I propose we add a Path.walk method that could look like this:

def walk(self, topdown=True, onerror=None, followlinks=False):
    for root, dirs, files in self._accessor.walk(
        self,
        topdown=topdown,
        onerror=onerror,
        followlinks=followlinks
        root_path = Path(root)
        yield (
            root_path,
            [root_path._make_child_relpath(dir_) for dir_ in dirs],
            [root_path._make_child_relpath(file) for file in files],

Note: this version does not handle a situation when top does not exist (similar to os.walk that also doesn't handle it and just returns an empty generator)

Some people could suggest using Path.glob instead but I found it to be less convenient for some use cases and generally slower (~2.7 times slower).

>>> timeit("list(Path('Lib').walk())", number=100, globals=globals())
1.9074640140170231
>>> timeit("list(Path('Lib').glob('**/*'))", number=100, globals=globals())
5.14890358998673

The idea is interesting, and I agree that glob with a maxi wildcard is not a great solution. There is discussion on the PR about adding walk vs extending iterdir; could you post a message on discuss.python.org and sum up the the discussion? (Pull requests on the CPython repo are only used to discuss implementation, not for debating ideas or proposing features.)

Thanks for the tip! Hopefully, I created it correctly:
https://discuss.python.org/t/add-pathlib-path-walk-method/

It is currently on review.

The discussion ended on the following decisions:

  • The original solution was based on wrapping os.walk but we have decided that it will be both faster and better in the long term to reimplement it (though I was not able to come up with any implementation that is significantly better than that of os.walk so it's 99% the same)
  • The original solution was yielding a tuple[Path, list[Path], list[Path]] but we decided to stay true to the original implementation and yield tuple[Path, list[str], list[str]]. It is both better in terms of optimization and should cause a little less confusion to the users
  • The original solution had the same arguments as os.walk but the interface was too convoluted so: we decided to rename arguments to snake-case and I decided to remove topdown argument by splitting walk into two methods. So we get walk() and walk_bottom_up(). I.e. Now we get two methods instead of one but with two modes of operation. The naming of methods is up to discussion because I myself am not very happy with these names but those are the best I found
  • I have been absent for quite some time so the pull request got stale and outdated so I'm opening a new one.

    docs: use 'recursively' in the description of rglob, and mention globs in the os equivalences #94954

    GH-92517 broke tests on wasm32-wasi platform.

    ======================================================================
    FAIL: test_walk_symlink_location (test.test_pathlib.WalkTests.test_walk_symlink_location)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/Lib/test/test_pathlib.py", line 2626, in test_walk_symlink_location
        self.assertIn("link", files)
    AssertionError: 'link' not found in ['tmp3']
    ----------------------------------------------------------------------
    Ran 458 tests in 1.229s
    FAILED (failures=1, skipped=182)
    test test_pathlib failed