April 13, 2018

Streaming and saving subprocess output at the same time in Python

Sometimes, you want to run a subprocess with Python and stream/print its output live to the calling process' terminal, and at the same time save the output to a variable. Here's how:

proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in proc.stdout:
    # do stuff with the line variable here
November 6, 2017

Better ways of managing pip dependencies

Of all the languages I've worked with, Python is one of the most annoying to work with when it comes to managing dependencies - only Go annoys me more. The industry standard is to keep a strict list of your dependencies (and their dependencies) in a requirements.txt file. Handily, this can be auto-generated with pip freeze > requirements.txt.

What's the problem with requirement files? It's not really a problem as long as you only have one requirements file, but if you want to start splitting up dev vs test/staging vs production dependencies, you'll immediately run into problems.

The most common solution is to have a requirements directory with base.txt, dev.txt, prod.txt and so on for whatever environments/contexts you need. The problem with this approach starts showing up when you want to add or upgrade a package and its dependencies - because you no longer have a single requirements file, you can't simply pip freeze > requirements.txt, so you end up carefully updating the file(s) by hand.

There are some existing third-party tools out there written to help with this problem.

pipenv/pipfile uses a completely new file format for storing dependencies, inspired by other language's more modern dependency managers. In the future this may be part of pip core, but it is not currently. Until then I'm staying far away from the project, as trying to implement it in a real-world project revealed all sorts of bugs. The codebase itself looks super sketchy, as it's downloaded upstream libraries like pip, but then applied patches on top of them.

pipwrap scans your virtualenv for packages, compares them to what's in your requirements files, and interactively asks you where it should place packages that are in your environment, but not in any requirements file.

pip-compile (part of pip-tools) lets you write more minimal requirements.in files, and auto-generates strict version requirements.txt files based on them. As a bonus you get to see where your nested dependencies are coming from.

However, there is an existing solution that works without introducing third-party tools. Since version 7.1, there is a --constraints flag to the pip install command which can be used to solve this problem.

A constraints file is an additional requirements file which won't be used to determine which packages to install, but will be used to lock down versions for any packages that do get installed. This means that you can put your base requirements (that is, you don't need to include dependencies of dependencies) in your requirements file, then store version locks for all environments in a separate constraints file.

First of all, we want to make sure we never forget to add --constraints constraints.txt by adding it to the top of our requirements/base.txt file (and any other requirements file that does not include -r base.txt). Next, generate the constraints file with pip freeze > requirements/constraints.txt. You can now modify all your requirements files, removing or loosening version constraints, and removing nested dependencies.

With that out of the way, let's look at some example workflows. Upgrade an existing package:

pip install 'django >= 2'
# no need to edit requirements/base.txt, "django" is already there
pip freeze > requirements/constraints.txt

Install a new package in dev:

echo 'pytest-cov' >> requirements/dev.txt
pip install -r requirements/dev.txt
pip freeze > requirements/constraints.txt

Install requirements in a fresh production or development environment works just like before:

pip install -r requirements/base.txt
pip install -r requirements/dev.txt

This isn't perfect. If you don't install every requirement file in development, your constraints file will be missing those files' requirements. A code review would catch accidentally removing a constraint, but how do you detect a package that is entirely missing from the constraints file? pip install doesn't even have a dry-run mode. Still, constraint files (or any of the third-party tools, really) are nice ways of improving and simplifying dependency managment with pip.

September 12, 2017

Russell, revisited

3 years ago I wrote about Russell, a static site/blog generator I wrote. Since then, I've had a major rewrite of the project to make it easier to extend and configure.

My sentiments towards other static site generators and CMSes are still the same, though at least by now the most popular ones aren't all written in Ruby.

I realized quickly though that I wanted more control over how my site was to be generated. I didn't want to be limited to what could be expressed in a YAML file - it basically meant that I would have to think ahead of anything that the user of Russell would want to do, and add support for that in the code that reads the YAML config and acts upon it.

The solution to this was simple: Use Python to run and configure Russell instead. When you run russell setup to create a new Russell site, the main entrypoint will be run.py.

Furthermore, I now recommend that you install Russell into a virtualenv which you can bring in other dependencies to as well. For example, in the source code for the website you're reading now, I bring in libsass to compile Sass files into CSS.

blog.write_file('assets/style.css', sass.compile(
    filename=os.path.join(ROOT_DIR, 'sass', 'main.sass')

If you're looking for a static site generator, especially for a blog or similar, and you like Python, I recommend now more than ever to check out Russell!

September 2, 2017

Proper logging in Django

Setting up logging in a sane way in Django has been surprisingly difficult due to some confusing setting names and the annoying way Django's default logging setup looks like. Here I'll go through some simple steps you can take to gain full control of your logging setup, without too many changes to a standard Django setup.

First of all, set LOGGING_CONFIG = None to prevent Django from setting up logging for you at all. You want this because in addition to the LOGGING dict that you define, Django has some defaults settings it will use, which you may not want.

Because we've set this, we need to call logging.dictConfig(LOGGING) ourselves. This can happen at the end of your settings file.

Make sure that LOGGING['disable_existing_loggers'] = False. If this is set to true, any loggers defined or invoked before logging.dictConfig is called will silently discard all its messages. You definitely don't want that.

Finally, I like to define LOGGING['root'] to have one log instance that controls everything, but sometimes log messages don't get sent to it. I found that setting the "" (empty string) logger can fix this:

LOGGING['loggers'][''] = { 'propagate': True }