The pathlib
module was introduced in Python 3.4 (see PEP-428) — or, more accurately, pathlib
was a 3rd party module which was added to the standard library (i.e., the packages that come with all installs of Python — unless excluded for, e.g., including library on smaller devices). It was attempting to provide a more friendly, object-oriented method for file system handling than os.path
, which relies on a string representation.
Introduction
Pathlib’s architecture defines a PurePath
(an object I’ve never found a need for) and it’s descendants. The primary entry point for me is Path
which, depending on your system (i.e., Windows/Linux) will instantiate a PosixPath
or WindowsPath
. These are concrete implementations with methods allowing for interaction with the file system itself. Their ‘pure’ counterparts (i.e., PurePosixPath
and PureWindowsPath
) do not have the ability to actually interact with the file system (no IO operations), but do carry some advantages of building complex paths/relationships.
Why use a pathlib.Path
object rather than os
‘s methods? Let’s take a look at a few examples before exploring the library itself. In the following code block, we’ll first create a base path, identify a subfile, check if it exists, and then read it.
import os from pathlib import Path # construct a path to a data subdirectory in a OS-independent way os_path = 'data' # just a string pl_path = Path('data') print(os_path, type(os_path)) # data <class 'str'> print(pl_path, type(pl_path)) # data <class 'pathlib.WindowsPath'> (or forward slashes and PosixPath) # get a directory in the path called 'stopwords.txt' os_sw_path = os.path.join(os_path, 'nlp', 'stopwords.txt') pl_sw_path = pl_path / 'nlp' / 'stopwords.txt' print(os_sw_path, type(os_sw_path)) # data\nlp\stopwords.txt <class 'str'> print(pl_sw_path, type(pl_sw_path)) # data\nlp\stopwords.txt <class 'pathlib.WindowsPath'> (or forward slashes and PosixPath) # check if the path exists os_exists = os.path.exists(os_sw_path) pl_exists = pl_path.exists() print(os_exists, pl_exists) # True True # open the file with open(os_sw_path, encoding='utf8') as fh: fh.read() with open(pl_sw_path, encoding='utf8') as fh: fh.read()
There are two key takeaways:
- The method/function names between the two libraries is very similar.
pathlib
uses a class andos.path
performs string manipulation.
On the latter point, the repeated calls to os.path
are replaced by methods. An equivalent mapping is provided in the docs.
Useful Functions
I’ll start by showing some of the equivalences in os.path
, but then turn to more unique elements which might be of some use.
Most of my use cases involve identifying a directory and then reading or writing files. Let’s begin by reading the files from a user-given directory. Using a library like click
, we can stipulate that the argument be given as pathlib.Path
, and include Path
as the type hint.
from pathlib import Path import click # command line arguments import opencv2 as cv2 # just for illustration @click.command() # use click to read command line arguments @click.argument('indir', type=click.Path(path_type=Path, file_okay=False, exists=True)) @click.argument('outdir', type=click.Path(path_type=Path)) def do_something(indir: Path, outdir: Path): if not indir.exists() or indir.is_file(): # this is already checked by `click`, but as an example raise ValueError('Indir does not exist. Expected directory containing files.') # iterate through all files in directory, and read text from all non-'.txt' files for file in indir.iterdir(): if file.suffix == '.txt': # suffix = extension continue text = file.read_text() # no need for `with open` block # iterate through all csv files in a directory for file in indir.glob('*.csv'): # get only csv files csv_to_excel(indir / f'{file.stem}.xlsx') # stem gets everything but suffix/extension # iterate through all png files in directory and subdirectories for file in indir.glob('**/*.png'): # get all PNG files, even in subdirectories # certain libraries, like opencv2, may require explicit string conversion, though this is rare cv2.imread(str(file)) outdir.mkdir(exist_ok=True, parents=True) # ensure outdir exists, and even create its parents if they don't exist with open(outdir / 'output.txt', 'w') as out: out.write('Niin metsä vastaa, kuin sinne huudetaan.') # use parentheses to refer to the object (outdir / 'output2.txt').write_text( 'Success is not final.\nFailure is not fatal.\nIt is courage to continue that counts.' ) if __name__ == '__main__': do_something()
Here, we show a lot of the path manipulation methods of a Path
.
Path.iterdir
: iterate through all filesPath.glob
: use ‘glob’ syntax where you create a ‘pattern’ to find- ‘*’ symbolizes any number of anything (
*.txt
gets all files ending in.txt
but notnot_me.txtr
which would require*.txt*
) - Recursive subdirectories can be obtained by
**/*
- Complete directory information is contained within the returned
Path
object - Alternatively use
Path.rglob(*.py)
which places a**/
before the provided parameter
- Complete directory information is contained within the returned
- ‘*’ symbolizes any number of anything (
- Check if
Path
is a file (.is_file()
) or is a directory (.is_dir()
). .mkdir()
to make the current directory (usingexist=True
to allow the directory to already exist).touch()
is the file equivalent of.mkdir()
- The opposite is intuitively
.rmdir()
- If you want to call a function on the
Path
returned by, e.g., `outdir / ‘filename.txt’, you’ll need to put parentheses around it.
I’ve also shown some other capabilities of pathlib.Path
including write_text
and read_text
. These essentially wrap open
with mode set to 'w'
and 'r'
, respectively.
Another useful component for apps is locating the user’s ‘home’ directory to store, e.g., configuration information. This can be done with Pat
h(‘~’).expanduser
().
Parting Shots
In my own work, I’ve found pathlib
to be a useful replacement for os.path
by providing a more readable, object oriented, and intuitive (e.g., in building a path) than the string-based methods in os.path
. Some of the labels are sufficiently different from os.path
to cause a little frustration (e.g., is_file
rather than isfile
), and remembering that the functions can return either strings (e.g., Path.stem
) or more pathlib.Path
s (e.g., Path.iterdir
). Once switching, I have never missed something from os.path
, so give it a try too!