Packaging and Releasing Private Python Code (Pt.1)

When dealing with a large Python code base managed by multiple teams, you often find that you need to be able to package and release this code independently. Most best-practices guides for releasing Python packages focus on public packages, and do not cover complex dependencies. In this post I’ll focus on how we, at Eventbrite, release our internal Python packages and avoid dependency hell while doing so. This first part will cover defining packages and their dependencies, while the second part will cover building and distributing Python wheels internally.

Python packages

Basics

Python packages can be installed in your environment using pip or easy_install. Packages can contain one or more modules – their most basic structure looks like this:

    example/
        __init__.py
    setup.py

The package name, version and dependencies are defined in setup.py. Bear in mind that even though it is best practice for the main module and the package name to be the same, there are numerous popular packages for which it’s not the case.

(Further reading on the basics of Python packaging)

Setup.py vs requirements.txt

There is bit of a confusion about these two files being mutually exclusive to each other. They are not; setup.py defines packages and its dependencies, while requirements.txt might be used for a top level application, if the application itself is not a package. It is the case, however, that requirements for a codebase should be specified in just one.

Choosing a package name

Even if your package is only going to be used internally and you don’t ever plan to open-source it, you still need to choose a name which will not clash with names in PyPI; otherwise, you may have conflicts if your team decides to use the public package with the same name.

Versioning

Versioning a package correctly is much more important than it seems at first glance. Following best practices is the only way of avoiding unresolvable dependency hell.  We strive to have all of our packages following Semantic Versioning 2.0 specification.

Semantic versioning follows the pattern of ‹major›.‹minor›.‹patch›. Our rules on which part of the version to bump are:

  • Major is bumped only for backwards incompatible changes
  • Patch is bumped if and only if the release is just a fix that does not change functionality
  • Everything else is a minor bump

For backwards incompatible changes, we first release a minor version with forwards compatible code, marking the code to be removed as deprecated. The next release would be a major release, which would only be removing deprecated code. Other packages / applications will not be automatically upgraded to the next major release until they’re moved to forwards compatible code. (More on that below in the dependencies section).

Versioning forks of 3rd party libraries

Generally you should try to merge your changes upstream (via pull request, patch, etc.), but sometimes it’s either not possible, or takes too long. At this point you need to be able to create your own release of the package with your changes and be able to tell your forked version of package from the vanilla upstream version. This is where the local version segment defined by PEP 440 comes in handy.

Let’s say you’ve forked from upstream version 1.2.3. Your patched fork version should then become 1.2.3+company, and subsequent ones 1.2.3+company.1, 1.2.3+company.2 etc.

Defining dependencies

Current best practice is to define dependencies using the compatible release operator ~= (as defined in PEP 440).

  • ~=X.Y.Z means any version X.Y with patch level Z or greater
  • ~=X.Y means any major version X with minor version Y or greater
  • (important gotcha: ~=X.Y.0 and ~=X.Y are not equivalent).

For packages which strictly follow Semantic Versioning 2 as described above, you can safely define your dependencies using the ~=X.Y specifier. If you need a specific bugfix version, you can write this as ~=X.Y,>=X.Y.Z

For 3rd party packages things get trickier. Many don’t follow strict semantic versioning, pushing out breaking changes even in minor releases. However, even with these packages, patch releases usually don’t have breaking changes, and often have the benefit of containing security fixes. This means that our recommended specifier for 3rd party packages is ~=X.Y.Z – allowing patch releases but not minor releases. You should always check the package’s changelog before allowing automatic patch upgrades.

Exact specifiers using the equality operator == are discouraged; however, there two cases where you have to use it. The first is badly behaving 3rd party packages which introduce breaking changes in patch releases (or that don’t follow the major.minor.patch pattern at all). Secondly, your local forks of 3rd party packages: to ensure that your fork will be installed, you must specify dependency as ==1.2.3+company. Note that you cannot use wildcards with a local specifier.

Version conflicts (aka dependency hell)

Imagine that your app requires foo==1.0.0 and bar==1.0.0, and in turn foo requires baz==1.2.0, while bar requires baz==1.3.0. This creates a version conflict, as you cannot install a version of baz that will satisfy both ==1.2.0 and ==1.3.0. This is where the compatible version operator and strict semantic versioning shine. In this case, if you have ~=1.2 and ~=1.3 requirements, version 1.3.0 satisfies both.

How does pip deal with version conflicts?

It doesn’t, really (it’s an open issue in pip project). Pip will silently hide the version conflicts, by installing whichever version was required last. Top level dependencies always take precedence over inner ones, but beyond the top level, it’s hard to tell which exact version will be installed in case of a conflict.

How do you find conflicts?

Pip provides a check command which will list all conflicts in your current installation/virtualenv. A third-party tool, pipdeptree, will not only list conflicts, but also provides information about exact packages causing them (as well as full information why each of the packages has been installed). In the aforementioned version conflict example, the output of pipdeptree would be:

Warning!!! Possibly conflicting dependencies found:

* foo==1.0.0
    - baz [required: ==1.2.0, installed: 1.3.0]

------------------------------------------------------------------------

package==1.2.3
    - foo [required: ==1.0.0, installed: 1.0.0]
        - baz [required: ==1.2.0, installed: 1.3.0]
    - bar [required: ==1.0.0, installed: 1.0.0]
        - baz [required: ==1.3.0, installed: 1.3.0]

Telling the installer where to get packages

By default, when you write package~=1.2.3, pip will look in the public Python Package Index (PyPI). There are a few ways of telling it how to get your private packages.

dependency_links

In setup.py you can define dependecy_links, which contain links to packages which cannot be found in PyPI.

dependency_links=['git+ssh://git@github.com/user/package@1.2.3#egg=package-1.2.3']

This works fine with setuptools (used when you run, for example, python setup.py install), however getting this to work with pip is harder, and may stop working in the future as it’s considered deprecated.

If you do want to make them work with pip, you need to pass an additional option to pip, --process-dependency-links. Then you’ll have to make sure that the egg name (#egg=package-1.2.3) matches the name and version in the requirements (setuptools is more lax about this). There is also the issue of having to specify an exact version in the dependency_links URL, negating all the benefits of the ~= operator’s flexibility.

PEP 508 Specifiers (the future)

PEP 508 introduces a URL directly in the dependency specifier to try and solve this problem.

Prior to PEP 508 you’d have

setup(
    install_requires=['package'],
    dependency_links=['git+ssh://git@github.com/user/package@1.2.3#egg=package-1.2.3']
)

With PEP 508 this would become

setup(
    install_requires=['package @ git+ssh://git@github.com/user/package@1.2.3#egg=package-1.2.3'],
)

PEP 508 is however not fully implemented yet – pip will support this format only in version 10, which hasn’t been released yet.

–find-links

There is a pip specific solution using the --find-links option, which takes a URL or file path to a HTML file. It’s a flat file, containing links in the format:

<a href="package-url">package-1.2.3</a>

The package URL can be link to a wheel, egg, source tarball or git repo. So you could have a file containing multiple versions, thus regaining the flexibility to use the ~= operator:

⋮
<a href="git+ssh://git@github.com/user/package@1.2.1#egg=package-1.2.1">package-1.2.1</a>
<a href="git+ssh://git@github.com/user/package@1.2.2#egg=package-1.2.2">package-1.2.2</a>
<a href="git+ssh://git@github.com/user/package@1.2.3#egg=package-1.2.3">package-1.2.3</a>
<a href="git+ssh://git@github.com/user/package@1.2.4#egg=package-1.2.4">package-1.2.4</a>
<a href="git+ssh://git@github.com/user/package@1.2.5#egg=package-1.2.5">package-1.2.5</a>
⋮

In part two I’ll cover how to automatically generate files for use with --find-links.

Private simple index

You can also create a PEP 503 conformant simple repository, which is only slightly more structured than the find-links file mentioned above. The main difference is that instead of having one file with all links, you now have a two-level structure, the first level being the package name without a version, and the second level being all the versions of the package. Again, we’ll look into this more in part two.

Tracking versions & reproducible deploys

When releasing code to production you want to be able to tell exactly which versions are deployed, as well as be able to redeploy exactly the same versions if needed. The easiest way to achieve this is to use pip freeze in your installation/virtualenv builder, storing its output in a file. The output of freeze is a list of requirements with the exact version match operator ==, and includes all installed packages, regardless if they’ve been installed directly or as a dependency of another package.

Store:

pip freeze > freeze.txt

Example freeze.txt

package==1.2.3
foo==1.0.1
something-forked==3.4.5+company

Recreating identical environment:

pip install -r freeze.txt

In the future it will be possible to use Pipfile.lock files (part of Pipfile project) to get even better reproducibility, but that’s not yet available with the current version of pip.

References and further reading

 

4 thoughts on “Packaging and Releasing Private Python Code (Pt.1)

  1. Greetings I am so delighted I found your webpage, I
    really found you by mistake, while I was searching on Google for something else, Regardless I am here now and would just like to say kudos for a fantastic post and
    a all round entertaining blog (I also love the theme/design),
    I don’t have time to look over it all at the moment but I have bookmarked it and also
    included your RSS feeds, so when I have time I will be back to
    read more, Please do keep up the excellent job.

  2. Hi Bartek,
    Thank you for publishing this useful information! Now that pip 10 is out, have you tried combining ‘install_requires” and ‘dependency_links’ (as you describe, above)? I am getting an error message when attempting to pip installing a package with a url/gitlab dependency that I built/uploaded to a local devpi instance:

    “Direct url requirement (like my_fake_lib@ git+https://gitlab/foo/my_fake_lib.git#egg=my-fake-lib-0.0.1) are not allowed for dependencies”

    I verified that everything works properly with the pre PEP508 method, with –process-dependency-links, etc.

  3. Hello! Thank you for this useful information. Have you had a chance to try pip 10? I was unable to verify the change you described in your section, “PEP 508 Specifiers (the future)”

Leave a Reply

Your email address will not be published. Required fields are marked *