May 20, 2020

PSM - Python Script Manager

It's fairly common for generally useful CLI tools to be written in Python. Some examples off the top of my head are ranger, streamlink, youtube-dl and glances.

Some of these are available as part of your operating system's package management, but may not be up to date. You can install them using pip, but installing things with pip is a beginner's trap.

If you install using sudo pip, you might overwrite packages on the system and break tools that are required for the operating system to work. This is avoidable by using pip install --user, which installs packages into $HOME/.local instead of globally, but does mean you have to modify your $PATH to include $HOME/.local/bin.

However, if you try to install two CLI tools which require two different versions of a dependency, you're likely to break one of them. The solution to this is virtual environments, which are fairly straight forward to use when working on a specific project, but not so much for installing things like CLI programs.

There are tools like pipsi and pipx which attempt to solve this: simply run pipx install ... and your thing will be installed. There's a chicken-and-egg problem, though: How do you install pipx? It's a python application with dependencies, afte rall. You end up having to make an exception for this specific tool and use pip install --user.

All of this annoyed me, and encouraged me to write PSM - Python Script Manager. It's a stand-alone shell script, so it can just be downloaded and put in a directory in your $PATH.

To install it:

curl | bash

Some usage examples:

psm install glances
psm upgrade-all

More information can be found on Github:

May 10, 2020

Asking for help in public or private chat

Here's a pattern I've seen in multiple organizations: If someone is stuck with a problem, they guess who might know the solution to that problem and ask them privately, either in person or over chat (e.g. Slack). I always preferred to be asked over chat rather than in-person because it means I can postpone reading it for 5 minutes if I'm in the middle of something, and now with remote working being pretty much mandatory, the in-person option isn't even there any more.

I think this is a bad habit, and something that should be discouraged especially in engineering departments. First, let's outline the advantages of asking in private, usually the reasons why people default to doing it:

  • You don't feel like you're bothering everyone in the channel who might not care about your problem.
  • You're guaranteed to get someone's attention, your question is very unlikely to be ignored.
  • It feels similar to what you would do in real life: Walk up to a colleague and ask them if they have time to help.

While these points are technically correct, I also think they're based on flawed premises, which we will get back to. For now, let's list the drawbacks of asking for help in private:

  • You have to know who to ask.
  • If you ask the wrong person, you've potentially interrupted them without really achieving anything.
  • If you ask the wrong person and you are referred to someone else, or you just guess someone else who might know, you're asking your question N times which wastes your time. This gets especially annoying if your question isn't a simple copy-and-paste one, for example if there are follow-up Q&A or additional context added after the fact.
  • If the person you ask gives you an answer, that answer may be incorrect or inefficient, and another colleague might be able to correct them - but because your messages are private, they can't.
  • Other colleagues cannot read your question and answers given to your question, and hence cannot learn from it.
  • It gives the impression that no one is asking for help, which can discourage others (especially new joiners) from asking for help.

It's worth noting that this is slightly different when not working remotely - if I walk over to a colleague in the office and ask them for help and we discuss the problem in person, others can overhear our discussion if they're not too busy with other things, and we avoid a lot of the problems mentioned above.

So what should you do instead? Ask in a public chat channel. Even Slack itself claims it's the ideal:

Slack is designed to add transparency to an organization, so it’s best to default to communication in public channels whenever possible. Slack’s own team sends tens of thousands of messages each week—in a recent summary, 70% of those were posted in public channels, with 28% occurring in private channels and just 2% in direct messages. Posting messages in public channels means anyone in the organization can see what various teams are working on, see how much progress people are making on projects, and search the archive for context they need.

Here are some tips for successfully asking for help in public chat channels:

  • If you're not sure which channel is the best fit, just put it in the most generic one. Most engineering companies have channels like #dev-example-app or just #development.
  • If your question is long and you don't want to flood the channel with text, summarize your question in a short paragraph then elaborate on it in a thread. This has the added bonus of encouraging others to use threads as well.
  • Ask your question well: Include the necessary context, what you've already tried, predict what follow-up questions might come - but also don't make it too long or include too many unncecessary details, or people will get overwhelmed and you're less likely to get help.
  • If you're worried that putting your question in a public chat channel will create noise for others, actually verify if this is the case. Are you bothered any time a question is asked in a public channel? Ask some colleagues in one-on-one situations what they think as well. In my experience, no one enables sound or pop-up notifications for channel messages. But if this is actually a problem somehow, consider creating dedicated channel(s) (e.g. #dev-help) for asking for help that can more easily be muted.
  • If, after some time (let's say an hour), you haven't resolved your question, consider mentioning colleagues who you suspect might be able to help (the people you would consider asking for help in private). You can do this either by sending a link to your question in a private message, or mentioning them in the channel/thread itself.
  • If you haven't resolved your question after a long amount of time (let's say 24 hours), simply post the question again and specify that you didn't get an answer within 24 hours.
  • If the question remains unresolved for even longer, you'll either have to escalate to your manager, or accept that no one actually knows and you'll have to figure out for yourself.
December 17, 2019

Building deb/rpm packages with FPM in Docker

Whether you're building on-premise software or just want to use packages as your atomic deployment mechanism of choice in a traditional bare-metal/VM infrastructure, deb/rpm packages are a nice thing to provide.

Unfortunately, building them is super tedious. Try googling for official documentation on how to build Debian packages and you'll find at least 5 official wiki pages all with slight variations. Redhat packages are a bit better but still rather tedious. Luckily, there's a program which has our back: FPM - Effing Package Management.

Looking into how to run this on an engineer's laptop as easily as in a CI/CD pipeline, annoyances keep piling up.

  1. Running FPM with the correct arguments is annoying. Let's put the invocations in a shell script or Makefile.
  2. We don't want to have to install FPM on our main system - it requires ruby, gems and more. Let's run it in a Docker container. Luckily, there exists a Docker image which contains FPM and its dependencies, as well as optional dependencies for extra features: eclecticiq/package
  3. The application runtime (binaries, libraries...) needs to be available in the same container, otherwise FPM doesn't know what to package. Let's use multi-stage Docker builds to build the application and make the resulting files available to the FPM container.
  4. Running docker build, docker run, then copying files out of the docker container is annoying. Let's put the invocations in the Makefile or another shell script.

With all of the above fixed, you end up with a single Makefile target or shell script which would build the project's deb/rpm packages for me, either on my laptop or in the CI/CD system of my choice.

I put together a complete example of how to build, run, and package an application using this system is available on Github: fpm-docker-example

August 6, 2018

Multi-stage Docker builds for Python projects

Multi-stage builds can help reduce your Docker image sizes in production. This has many benefits: Development dependencies may potentially expose extra security holes in your system (I've yet to see this happen, but why not be cautious if it's easy to be so?), but mostly by reducing image size you make it faster for others to docker pull it.

The concept of multi-stage builds is simple: Install development dependencies, build all the stuff you need, then copy over just the stuff you need to run in production in a brand new image without installing development dependencies not needed to run the application.

Here's an example Dockerfile using the official Python Docker images, which are based on Debian - but you can easily apply the same principle when building from Debian, Ubuntu, CentOS, or Alpine images: Have one stage where build/development dependencies are installed and the application is built, and another where runtime dependencies are installed and the application is ran.

FROM python:3.7-stretch AS build
RUN python3 -m venv /venv

# example of a development library package that needs to be installed
RUN apt-get -qy update && apt-get -qy install libldap2-dev && \
    rm -rf /var/cache/apt/* /var/lib/apt/lists/*

# install requirements separately to prevent pip from downloading and
# installing pypi dependencies every time a file in your project changes
ADD ./requirements /project/requirements
RUN /venv/bin/pip install -r /project/requirements/$REQS.txt

# install the project, basically copying its code, into the virtualenv.
# this assumes the project has a functional
ADD . /project
RUN /venv/bin/pip install /project

# this won't have any effect on our production image, is only meant for
# if we want to run commands like pytest in the build image
WORKDIR /project

# the second, production stage can be much more lightweight:
FROM python:3.7-slim-stretch AS production
COPY --from=build /venv /venv

# install runtime libraries (different from development libraries!)
RUN apt-get -qy update && apt-get -qy install libldap-2.4-2 && \
    rm -rf /var/cache/apt/* /var/lib/apt/lists/*

# remember to run python from the virtualenv
CMD ["/venv/bin/python3", "-m", "myproject"]

Copying the virtual environment is by far the easiest approach to this problem. Python purists will say that virtual environments shouldn't be copied, but when the underlying system is the same and the path is the same, it makes literally no difference (plus virtual environments are a dirty hack to begin with, one more dirty hack doesn't make a difference).

There are a few alternate approaches, the most relevant of which is to build a wheel cache of your dependencies and mount that in as a volume. The problem with this is that Docker doesn't let you mount volumes in the build stage, so you have to make complex shell scripts and multiple Dockerfiles to make it work, and the only major advantage is that you don't always have to re-compile wheels (which should be on pypi anyway, and my dependencies don't change that often).

Another thing of note: In our example, we install both project dependencies and the project itself into the virtualenv. This means we don't even need the project root directory in the production image, which is also nice (no risk of leaking example configuration files, git history etc.).

To build the image and run our project, assuming it's a webserver listening on port 5000, these commands should let you visit http://localhost:5000 in your browser:

$ docker build --tag=myproject .
$ docker run --rm -it -p5000:5000 myproject

Running tests

What if we want to build an image for running tests, which require some extra development dependencies? That's where the purpose of our ARG REQS comes in. By setting this build argument when running docker build, we can control which requirements file is read. Combine that with the --target argument to docker run and this is how you build a development/testing image:

$ docker build --target=build --build-arg REQS=dev --tag=myproject-dev .

And let's say you want to run some commands using that image:

$ docker run --rm -it myproject-dev /venv/bin/pytest
$ docker run --rm -it myproject-dev bash

Development in Docker

Note that you'll have to re-build the image any time code changes. I don't care too much about this since I do all my development locally anyway, and only use Docker for production and continuous integration, but if it's important to you, you'll have to:

  1. Change pip install /project to pip install -e /project
  2. Copy the entire /project directory into the production image as well
  3. Mount the project's root directory as /project with docker run --volume=$PWD:/project

Example project

If you want a functional example to play around with, I've got a git repository on Github with a sample Python project which has a docker-multistage branch: python-project-examples