I have had a few conversations lately about Python packaging, particularly around structuring the import
statements to access the various modules of a package. This was something I had to do a lot of investigation of and experimentation with when I was organizing the leiap
package. Still, I have not seen a good guide to best practices in various scenarios, so I thought I would share my thoughts here.
The key to designing how the user will interact with the modules is the package’s __init__.py
file. This will define what gets brought into the namespace with the import
statement.
It is usually a good idea to split code into smaller modules for a couple of reasons. Primarily, modules can contain all of the code related to a particular coherent topic (e.g., all of the I/O functionality) without being cluttered by code related to something completely different (e.g., plotting). For this reason, it is common to see large classes get a dedicated module (e.g., geodataframe.py
within geopandas
). Secondarily, dividing code into appropriate logical units makes it easier to read and easier to understand.
However, a good module structure for the developer may or may not be a good module structure for the user. In some cases, the user might not need to know that there are various modules underlying the package. In other cases, there might be good reasons that the user should explicitly ask only for the modules they need. That is what I want to explore here: what are the different use cases and what approach do they call for from the package developer.
Python packages come in a variety of structures, but let’s create a simple demo one here that we can use in all the examples.
/src
/example_pkg
__init__.py
foo.py
bar.py
baz.py
setup.py
README.md
LICENSE
It is composed of three modules: foo.py
, bar.py
, and baz.py
, each of which has a single function that prints the name of the module where the function resides.
def foo_func():
print(‘this is a foo function’)
def bar_func():
print(‘this is a bar function’)
def baz_func():
print(‘this is a baz function’)
Now is a good time to acknowledge that talking about import
statements and package structures can be pretty hard to follow, especially in text. To help make things a bit clearer, let’s think about a Python package as a grocery store and your users as the shoppers. As the developer, you are the store owner and manager. Your job is to figure out how to set up your store so that you serve your customers best. The structure of your __init__.py
file will determine that setup. Below, I’ll walk through three alternative ways to set up that file: the general store, the convenience store, and the online store.
In this scenario, the user gets access to everything right away on import example_pkg
. In their code, they only need to type the package name and the class, function, or another object they want, regardless of what module of the source code it lives in.
This scenario is like an old-timey general store. Once the customer walks in the door, they can see all the goods placed with minimal fuss in bins and shelves around the store.
# __init__.py
from .foo import *
from .bar import *
from .baz import *
import example_pkgexample_pkg.foo_func()
example_pkg.bar_func()
example_pkg.baz_func()
example_pkg.<TAB>
. Tab-completion is like the general store grocer who knows exactly where everything is and is happy to help.import
statements; they will automatically be included. In the general store, there is no fancy signage to change. Just put a new item on the shelf.save()
in both the foo
and bar
modules). You don’t want to confuse your customers by putting apples in two different bins._function_name()
). Most general stores don’t have a big storage area where things like brooms and mops are kept; those items are visible to the customer. Even if it is unlikely that they would pick up a broom and start sweeping your floors, you might not want them to. In that case, you have to take extra steps to hide those supplies from view.pandas
or numpy
). This is the “general” part of general store.leiap
package)pandas
numpy
(with additional complexity)seaborn
By far the easiest to read and understand is a variation on the general store scenario that I call the convenience store. Instead of from .module import *
, you can specify exactly what to import with from .module import func
within __init__.py
.
The convenience store shares a lot of traits with the general store. It has a relatively limited selection of goods which can be replaced at any time with minimal hassle. The customer doesn’t need a lot of signage to find what they need because most of the goods are easily in view. The biggest difference is that a convenience store has a bit more order. The empty boxes, brooms, and mops are all kept out of view of the customer and only the products for sale are on the shelves.
# __init__.py
from .foo import foo_func
from .bar import bar_func
from .baz import baz_func
import example_pkgexample_pkg.foo_func()
example_pkg.bar_func()
example_pkg.baz_func()
Shares all of the advantages of the general store, and adds:
__init__.py
can end up very cluttered if there are many modules with many functions. Like the general store, a convenience store that is too cluttered will be harder for customers to navigate.__init__.py
file too. Modern IDEs can help detect missed imports, but it is still easy to forget. Your convenience store has some minimal signage and price tags. You have to remember to update these when you change what is on the shelf.I would add the following to the recommendations from the general store:
Class
(e.g., from geopandas.geodataframe import GeoDataFrame
)geopandas
Anyone who has bought groceries online knows that ordering the right product can take some effort on the part of the customer. You have to search for the product, choose a brand, choose the desired size, etc. All of these steps, however, allow you to buy exactly what you want from a nearly limitless stockroom.
In the case of Python packages, in some cases, it might be more prudent to eschew the convenience of simply importing the entire package and instead force the user to be more clear about what pieces are being imported. This allows you as the developer to include a lot more pieces to the package without overwhelming the user.
# __init__.py
import example_pkg.foo
import example_pkg.bar
import example_pkg.baz
There are (at least) three different methods that a user could adopt in this case.
import example_pkgexample_pkg.foo.foo_func()
example_pkg.bar.bar_func()
example_pkg.bar.baz_func()
or
from example_pkg import foo, bar, bazfoo.foo_func()
bar.bar_func()
baz.baz_func()
or
import example_pkg.foo as ex_foo
import example_pkg.bar as ex_bar
import example_pkg.baz as ex_bazex_foo.foo_func()
ex_bar.bar_func()
ex_baz.baz_func()
__init__.py
file. Only needs to be updated when a new module is added. Updating your online store is relatively painless. You only need to change a setting in your product database.import matplotlib.pyplot as plt
). While online grocery shopping can be a big pain at first, if you save your shopping list for the future, your shopping can be done a lot quicker.save()
in both the foo
and bar
modules)foo.foo_func()
does not indicate which package foo
comes from.import example_pkg
, with no alias) can lead to long code chunks (e.g., example_pkg.foo.foo_func()
) that clutter things up.import example_pkg
imports a LOT of objects and might be slow.matplotlib
*scikit-learn
*bokeh
*scipy
** These packages actually use combinations of different approaches in their __init__.py
files. I include them here because to users, they are generally used à la carte (e.g., import matplotlib.pyplot as plt
or import scipy.stats.kde
).
The three scenarios I outlined are certainly not the only possible structures for a Python package, but I hope they cover most of the cases that anyone reading learning this from a blog might be considering. In conclusion, I’ll return to a point I made earlier: a good module structure for the developer may or may not be a good module structure for the user. Whatever your decision, don’t forget to put yourself in the user’s shoes even, or especially, because that user is most likely to be you.
Source:
https://towardsdatascience.com/whats-init-for-me-d70a312da583