It seems easy. I need a package, say pandas
, so I run pip install pandas
. Then, at the top of my file I can get access to this library by a simple import at the top: import pandas as pd
. But how does Python determine where the package is located?
First, Python will check if the import is in the built-in dictionary sys.modules
which includes a subset of the standard library. Once a package is successfully imported, it will be inserted into this dictionary. The keys in sys.modules
are a string of the package name and point to the module itself.
If the desired import is not in sys.modules
, Python will search in the directories and files of sys.path
. To see the contents of sys.path
, you can run python -c "import sys; print(sys.path)"
. This command returns a list starting with the empty string and ending with the site-packages
directory. The empty string (''
) at the beginning means that Python will first search in the current directory for imports, before moving on to site-packages
(where pip install
-ed packages are placed). This means that if you have a pandas.py
file in your current directory and try to run import pandas as pd
, the module will point to your local pandas.py
rather than the pip-installed pandas
.
>>> python -c "import pandas as pd; print(pd.DataFrame)" Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'pandas' >> pip install pandas Collecting pandas... [skipping install output] >>> python -c "import pandas as pd; print(pd.DataFrame)" <class 'pandas.core.frame.DataFrame'> >>> touch pandas.py. # create file in current directory # pandas attempts to load `pandas.py`, not the pip-installed pandas >>> python -c "import pandas as pd; print(pd.DataFrame)" Traceback (most recent call last): File "<string>", line 1, in <module> AttributeError: module 'pandas' has no attribute 'DataFrame' >>> cd .. # pandas.py not longer in current directory >>> python -c "import pandas as pd; print(pd.DataFrame)" <class 'pandas.core.frame.DataFrame'>
We can also customize sys.path
by creating an environment variable for PYTHONPATH
. This directory will be looked in immediately after the current directory. Let’s suppose that we have a project structure with a src/
directory containing our package mypackage
. We want to import mypackage
. While we could just cd src
, let’s instead add src
to our PYTHONPATH
, telling Python to look there. (To make this more robust, we could add the fullpath to the src
directory.)
- Powershell:
$env:PYTHONPATH='src'
- Command prompt:
set PYTHONPATH=src
- Linux/Mac:
export PYTHONPATH=src
After adding the directory to the Python path, we can run the command to print out sys.path
:
>>> python -c "import sys; print(sys.path)" ['', ...] >>> $env:PYTHONPATH='src' >>> python -c "import sys; print(sys.path)" ['', 'src', ...]
How is this useful? Not only does it provide some insight into how the otherwise mysterious import
statement functions, but it also provides a ready-made April Fool’s Prank by dropping an empty pandas.py
file (or equivalent) wherever a co-worker might run some shared code…Even better, have it print out a greeting. And then you can explain how the whole import process works.