When dealing with a large Python code base managed by multiple teams, you often find that you need to be able to package and release this code independently. Most best-practices guides for releasing Python packages focus on public packages, and do not cover complex dependencies. In this post I’ll focus on how we, at Eventbrite, release our internal Python packages and avoid dependency hell while doing so. This first part will cover defining packages and their dependencies, while the second part will cover building and distributing Python wheels internally.
Python packages can be installed in your environment using pip or easy_install. Packages can contain one or more modules – their most basic structure looks like this:
example/ __init__.py setup.py
The package name, version and dependencies are defined in
setup.py. Bear in mind that even though it is best practice for the main module and the package name to be the same, there are numerous popular packages for which it’s not the case.
Setup.py vs requirements.txt
There is bit of a confusion about these two files being mutually exclusive to each other. They are not;
setup.py defines packages and its dependencies, while
requirements.txt might be used for a top level application, if the application itself is not a package. It is the case, however, that requirements for a codebase should be specified in just one.
Choosing a package name
Even if your package is only going to be used internally and you don’t ever plan to open-source it, you still need to choose a name which will not clash with names in PyPI; otherwise, you may have conflicts if your team decides to use the public package with the same name.
Versioning a package correctly is much more important than it seems at first glance. Following best practices is the only way of avoiding unresolvable dependency hell. We strive to have all of our packages following Semantic Versioning 2.0 specification.
Semantic versioning follows the pattern of ‹major›.‹minor›.‹patch›. Our rules on which part of the version to bump are:
- Major is bumped only for backwards incompatible changes
- Patch is bumped if and only if the release is just a fix that does not change functionality
- Everything else is a minor bump
For backwards incompatible changes, we first release a minor version with forwards compatible code, marking the code to be removed as deprecated. The next release would be a major release, which would only be removing deprecated code. Other packages / applications will not be automatically upgraded to the next major release until they’re moved to forwards compatible code. (More on that below in the dependencies section).
Versioning forks of 3rd party libraries
Generally you should try to merge your changes upstream (via pull request, patch, etc.), but sometimes it’s either not possible, or takes too long. At this point you need to be able to create your own release of the package with your changes and be able to tell your forked version of package from the vanilla upstream version. This is where the local version segment defined by PEP 440 comes in handy.
Let’s say you’ve forked from upstream version
1.2.3. Your patched fork version should then become
1.2.3+company, and subsequent ones
Current best practice is to define dependencies using the compatible release operator
~= (as defined in PEP 440).
~=X.Y.Zmeans any version X.Y with patch level Z or greater
~=X.Ymeans any major version X with minor version Y or greater
- (important gotcha:
~=X.Yare not equivalent).
For packages which strictly follow Semantic Versioning 2 as described above, you can safely define your dependencies using the
~=X.Y specifier. If you need a specific bugfix version, you can write this as
For 3rd party packages things get trickier. Many don’t follow strict semantic versioning, pushing out breaking changes even in minor releases. However, even with these packages, patch releases usually don’t have breaking changes, and often have the benefit of containing security fixes. This means that our recommended specifier for 3rd party packages is
~=X.Y.Z – allowing patch releases but not minor releases. You should always check the package’s changelog before allowing automatic patch upgrades.
Exact specifiers using the equality operator
== are discouraged; however, there two cases where you have to use it. The first is badly behaving 3rd party packages which introduce breaking changes in patch releases (or that don’t follow the major.minor.patch pattern at all). Secondly, your local forks of 3rd party packages: to ensure that your fork will be installed, you must specify dependency as
==1.2.3+company. Note that you cannot use wildcards with a local specifier.
Version conflicts (aka dependency hell)
Imagine that your app requires
bar==1.0.0, and in turn
baz==1.3.0. This creates a version conflict, as you cannot install a version of
baz that will satisfy both
==1.3.0. This is where the compatible version operator and strict semantic versioning shine. In this case, if you have
~=1.3 requirements, version
1.3.0 satisfies both.
How does pip deal with version conflicts?
It doesn’t, really (it’s an open issue in pip project). Pip will silently hide the version conflicts, by installing whichever version was required last. Top level dependencies always take precedence over inner ones, but beyond the top level, it’s hard to tell which exact version will be installed in case of a conflict.
How do you find conflicts?
Pip provides a
check command which will list all conflicts in your current installation/virtualenv. A third-party tool,
pipdeptree, will not only list conflicts, but also provides information about exact packages causing them (as well as full information why each of the packages has been installed). In the aforementioned version conflict example, the output of
pipdeptree would be:
Warning!!! Possibly conflicting dependencies found: * foo==1.0.0 - baz [required: ==1.2.0, installed: 1.3.0] ------------------------------------------------------------------------ package==1.2.3 - foo [required: ==1.0.0, installed: 1.0.0] - baz [required: ==1.2.0, installed: 1.3.0] - bar [required: ==1.0.0, installed: 1.0.0] - baz [required: ==1.3.0, installed: 1.3.0]
Telling the installer where to get packages
By default, when you write
package~=1.2.3, pip will look in the public Python Package Index (PyPI). There are a few ways of telling it how to get your private packages.
setup.py you can define
dependecy_links, which contain links to packages which cannot be found in PyPI.
This works fine with setuptools (used when you run, for example,
python setup.py install), however getting this to work with pip is harder, and may stop working in the future as it’s considered deprecated.
If you do want to make them work with pip, you need to pass an additional option to pip,
--process-dependency-links. Then you’ll have to make sure that the egg name (
#egg=package-1.2.3) matches the name and version in the requirements (setuptools is more lax about this). There is also the issue of having to specify an exact version in the
dependency_links URL, negating all the benefits of the
~= operator’s flexibility.
PEP 508 Specifiers (the future)
PEP 508 introduces a URL directly in the dependency specifier to try and solve this problem.
Prior to PEP 508 you’d have
setup( install_requires=['package'], dependency_links=['git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.3'] )
With PEP 508 this would become
setup( install_requires=['package @ git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.3'], )
PEP 508 is however not fully implemented yet – pip will support this format only in version 10, which hasn’t been released yet.
There is a pip specific solution using the
--find-links option, which takes a URL or file path to a HTML file. It’s a flat file, containing links in the format:
The package URL can be link to a wheel, egg, source tarball or git repo. So you could have a file containing multiple versions, thus regaining the flexibility to use the
⋮ <a href="git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.1">package-1.2.1</a> <a href="git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.2">package-1.2.2</a> <a href="git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.3">package-1.2.3</a> <a href="git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.4">package-1.2.4</a> <a href="git+ssh://email@example.comfirstname.lastname@example.org#egg=package-1.2.5">package-1.2.5</a> ⋮
In part two I’ll cover how to automatically generate files for use with
Private simple index
You can also create a PEP 503 conformant simple repository, which is only slightly more structured than the find-links file mentioned above. The main difference is that instead of having one file with all links, you now have a two-level structure, the first level being the package name without a version, and the second level being all the versions of the package. Again, we’ll look into this more in part two.
Tracking versions & reproducible deploys
When releasing code to production you want to be able to tell exactly which versions are deployed, as well as be able to redeploy exactly the same versions if needed. The easiest way to achieve this is to use pip freeze in your installation/virtualenv builder, storing its output in a file. The output of freeze is a list of requirements with the exact version match operator
==, and includes all installed packages, regardless if they’ve been installed directly or as a dependency of another package.
pip freeze > freeze.txt
package==1.2.3 foo==1.0.1 something-forked==3.4.5+company
Recreating identical environment:
pip install -r freeze.txt
In the future it will be possible to use
Pipfile.lock files (part of Pipfile project) to get even better reproducibility, but that’s not yet available with the current version of pip.
References and further reading
- Python Packaging – https://python-packaging.readthedocs.io/
- Building and Distributing Packages with Setuptools – http://setuptools.readthedocs.io/en/latest/setuptools.html#building-and-distributing-packages-with-setuptools
- Semantic Versioning – http://semver.org/
- PEP 440 — Version Identification and Dependency Specification – https://www.python.org/dev/peps/pep-0440/
- PEP 503 — Simple Repository API – https://www.python.org/dev/peps/pep-0503/
- PEP 508 — Dependency specification for Python Software Packages – https://www.python.org/dev/peps/pep-0508/