It seems easy. I need a package, say pandas, so I run pip install pandas. Then, at the top of my file I can get access to this library by a simple import at the top: import pandas as pd. But how does Python determine where the package is located?
First, Python will check if the import is in the built-in dictionary sys.modules which includes a subset of the standard library. Once a package is successfully imported, it will be inserted into this dictionary. The keys in sys.modules are a string of the package name and point to the module itself.
If the desired import is not in sys.modules, Python will search in the directories and files of sys.path. To see the contents of sys.path, you can run python -c "import sys; print(sys.path)". This command returns a list starting with the empty string and ending with the site-packages directory. The empty string ('') at the beginning means that Python will first search in the current directory for imports, before moving on to site-packages (where pip install-ed packages are placed). This means that if you have a pandas.py file in your current directory and try to run import pandas as pd, the module will point to your local pandas.py rather than the pip-installed pandas.
>>> python -c "import pandas as pd; print(pd.DataFrame)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'
>> pip install pandas
Collecting pandas... [skipping install output]
>>> python -c "import pandas as pd; print(pd.DataFrame)"
<class 'pandas.core.frame.DataFrame'>
>>> touch pandas.py. # create file in current directory
# pandas attempts to load `pandas.py`, not the pip-installed pandas
>>> python -c "import pandas as pd; print(pd.DataFrame)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: module 'pandas' has no attribute 'DataFrame'
>>> cd .. # pandas.py not longer in current directory
>>> python -c "import pandas as pd; print(pd.DataFrame)"
<class 'pandas.core.frame.DataFrame'>
We can also customize sys.path by creating an environment variable for PYTHONPATH. This directory will be looked in immediately after the current directory. Let’s suppose that we have a project structure with a src/ directory containing our package mypackage. We want to import mypackage. While we could just cd src, let’s instead add src to our PYTHONPATH, telling Python to look there. (To make this more robust, we could add the fullpath to the src directory.)
- Powershell:
$env:PYTHONPATH='src' - Command prompt:
set PYTHONPATH=src - Linux/Mac:
export PYTHONPATH=src
After adding the directory to the Python path, we can run the command to print out sys.path:
>>> python -c "import sys; print(sys.path)"
['', ...]
>>> $env:PYTHONPATH='src'
>>> python -c "import sys; print(sys.path)"
['', 'src', ...]
How is this useful? Not only does it provide some insight into how the otherwise mysterious import statement functions, but it also provides a ready-made April Fool’s Prank by dropping an empty pandas.py file (or equivalent) wherever a co-worker might run some shared code…Even better, have it print out a greeting. And then you can explain how the whole import process works.