Musings of an Eon...

Elixir + GitLab AutoDevOps

Published 2019 Oct 09 @ 13:13

For those unaware of GitLab’s AutoDevOps feature, I would recommend checking out the official documentation. TL;DR GitLab is trying to make a CI/CD pipeline that’s as simple as deploying applications is with Heroku. Essentially, you “flip a switch” and any git push you do to the repo runs the pipeline (assuming the branch/tag meets certain prerequisites). It’s actually a very wonderful feature… in theory. But the devil, as they say, is in the details.

Heroku On Kubernetes

Heroku’s fame is in their ability to do “push-button” deployments, i.e. when you push code changes to Heroku it then deploys an application. GitLab’s AutoDevOps feature wants to be that simple but with a full devops pipeline that can deploy your application to Kubernetes.

For a rather new feature, it’s impressive. But it is only truly push-button easy if you have an application that would, in its present state, be a push-button deployment on Heroku. And even if you can deploy your application to Heroku that doesn’t mean you want to, as that might not be the most efficient deployment strategy for you.

Indeed, one of the big draws of Kubernetes is to deploy container images. With Heroku, you deploy the entire repository and have functions that start up your application after that. If you’re using a compiled language, you essentially lose the benefits of being compiled because you have to deploy the entire repo every time you increase the number of dynos (i.e. pods) you are running.

Because I’m using Elixir, which is a compiled language, I wanted to get the benefit of compiling a package that I could then deploy to a small-footprint container. I wouldn’t need my tests in production, I wouldn’t need my developer utilities, I would only need the stuff that is directly relevant to production.

Unfortunately, that caused some issues. Before I get into them, and how I fixed them, let me first give a rundown on my tech stack:

  • Elixir for my programming language.
  • Phoenix Framework, a Rails-like web framework for Elixir.
  • Docker for building container images.
  • GitLab for repository hosting and their AutoDevOps CI/CD pipeline (obviously).

Engage Thrusters!

The first step is to enable AutoDevOps. I believe this is supposed to default to on for new projects but will be set to off if the first attempted build fails. That seems to be GitLab’s way of saying “we’re going to try out your application with this cool new automatic pipeline, but if it doesn’t work we’ll just assume you don’t want it or can’t use it yet and will disable it until you tell us otherwise.” Simple enough. For “supported” languages this would probably work with few, if any, changes to the application code or configuration. And what are “supported” languages? Anything that has an official Heroku buildpack. Wait, a buildpack?

Taking the Heroku Comparison Too Far

GitLab admitted they wanted the simplicity of Heroku for their AutoDevOps pipeline, but they might be taking that comparison too far. GitLab uses a tool called herokuish that can effectively build and run applications in a manner similar to Heroku, except obviously not on the Heroku platform. And how does Heroku build and run applications? Through something called buildpacks. A buildpack is just a fancy name for a plugin that, given some source code, exposes a set of functionality that can be used to build, test, and run the application.

Because of this, the AutoDevOps feature has out-of-the-box support for any of the languages that Heroku has an official buildpack for. You can see that list here. Admittedly, there are a lot of supported languages. However, some prominent ones are definitely missing, such as Rust, C++, and Elixir, just to name a few. If you fall into the category of unsupported language, you have two options:

  1. Find a custom buildpack and set a CI/CD Variable (in Settings) named BUILDPACK_URL to the buildpack’s repo.
  2. Disable Auto Test because it’s not going to work, period. You need a buildpack to use the Auto Test portion of the pipeline.

Yeah, so, if you want to use the built-in Auto Test job, you need a buildpack, no exceptions. You can disable this job through a CI/CD Variable, thankfully. Just set TEST_DISABLED to any value; as long as it is present in your settings then the job will be omitted.

Only For Testing

I want to be very explicit with what I just said:

Herokuish is only required for the Auto Test function.

The Auto Build and Auto Deploy functions can build and deploy container images if a Dockerfile is present in the root level of your repo. However, the resulting image is not used in the test phase. I kind of understand why this was done, because building a production image would, ideally, remove any code related to testing and debugging to optimize the final artifacts for production.

Final Word: Buildpacks

You almost definitely want to have your own <language>_buildpack.config file if you are using a custom buildpack. This is because any pre-test setup that your application requires will not be done without explicitly telling the buildpack to do so.

For my own testing, I was receiving errors due to a static asset not available. The files were clearly added to the repository and present on the test VM, but I forgot that because I was using a buildpack that I wasn’t having my assets built before tests were being run like I was on my local machine. I added a “pre-hook” command to the config file that had it build assets first and then my tests started passing.

Docker by Default

If a Dockerfile is present in your repository (and you can set a custom file path to one in your CI/CD Variables if you need to), the Auto Build job will automatically use that to build a container image, put it into GitLab’s image repository for your project, and then use that image in the Auto Deploy job when setting up Kubernetes deployments.

I would personally recommend using the multi-stage build approach that is baked into Docker now. The “old” way of doing this involved writing multiple Dockerfiles, or using a specialized Docker-in-Docker image to build container images within a docker container image, in an attempt to incrementally build your final image. This was done because the build process typically included additional build packages that weren’t necessary when running the final artifacts in production, so you’d use one or more build images that were bloated with build tools and dev packages, extracting the outputs from each into a final image that contained only the absolute necessities for your application to run.

I’m not going to speak much about how to do multi-stage builds or why you should do so. You can read the documentation or other articles that do great jobs of explaining why this is better. All I will say is that the multi-stage approach was significantly simpler than what I was attempting previously (using “the old way”) and still managed to reduce my image size by over 50%.

However, I learned an interesting lesson on compilation “gotchas” as a result of this process.

Compilation - Static vs Dynamic

Elixir applications can be difficult to generate releases for. Thankfully, there is a tool called Distillery which simplifies the process greatly. The build flow, with Distillery, looks like:

  1. Get dependencies (apk add, mix deps.get, npm install, etc).
  2. Build assets (e.g. node node_modules/webpack/bin/webpack.js -p).
  3. Generate asset manifests (mix phx.digest).
  4. Generate a release (mix distillery.release --env=prod).

However, there is a big problem: Elixir is a compiled language and it’s configuration must be a static file. For all the amazing things that I could say about Erlang, requiring that the configuration be a single, static file is not one of them. When the build occurs, anything in your configuration files that gets a value from the system (environment variables) will pull in the value from your build environment and then it will be hard-coded in your final release configuration. This means you cannot build once and deploy to multiple environments, because you don’t use any of the environment variables for those separate environments. You would need to change your release config file for each environment, essentially.

So, how to get around this. There’s actually no easy answer. The official documentation for Distillery actually has a section on Runtime Configuration specifically for this issue. Ultimately the suggestion is to provide all such “dynamic” variables as parameters and not to rely on environment variables. However, you can see in the example code that System.get_env/1 is still being used, just inside the init/1 function of the application. So environment variables should be used, but only when initializing something, and then the information should be passed as a parameter after that.

Let me reiterate that point. You should use environment variables, but use them in code that performs initialization, not in a configuration file. One common pattern for this is to set a config value to {:system, "ENV_VAR"}. When that particular tuple is found, most libraries will dynamically grab the value from the system (i.e. the environment variable) during an initialization step. This allows us to define the name of the variable in a config but read from the environment at runtime and not build time.

The problem here is when you have systems that start automatically, without any intervention from within your own application, and that other app uses environment variables or config files. There’s no good way to get around that.

Ecto Probalo

Thankfully, I only had one dependent application that I was having difficulty transitioning to dynamic environment variables: Ecto, the database adapter. As far as I can tell, you cannot use {:system, "ENV_VAR"} in the Ecto configs. I was extremely worried at first because although this was only one library I was having a problem with, it is also a bedrock for the application; there aren’t many alternatives available to Ecto, at least none that wouldn’t pose separate problems and learning curves.

After digging around, I found some advice in the Ecto documentation that suggests overriding init/1 in your application’s repo module (the one that calls use Ecto.Repo). Instead of giving configuration options, Ecto has gone the route of allowing developers to have essentially complete control over the initialization phase of the database adapter. Which turned out to be an even better solution for me. Why? Because I could program environment-specific fallbacks instead of having one universal fallback. For example, for my dev and test environments I use a database running on localhost, but the name of the database changes; it is suffixed with either _dev or _test. Instead of having two separate config entries for this difference, I can get the name of my environment in init/1 and dynamically create a fallback URL if no environment variable is provided.

CI vs ENV Variables

One of the least obvious issues with AutoDevOps is how “variables” are handled. In the CI/CD Settings there’s a section titled Variables, and it seems to allow you to set environment variables at first glance. But they aren’t. Well they are. Sort of.

When you set a “variable” in the CI/CD settings, you’re setting the environment variables for the CI/CD pipeline and not your application. If you set something here, it won’t be accessible to the final pod that your application runs in.

Although this was extremely annoying at first, GitLab supports an “easy” fix: add K8S_SECRET_ as a prefix to the name of any variable you need to be accessible to the application. These variables are effectively added to a generated Kubernetes secrets file that is connected to the deployment configuration so that they are available in the pod’s environment.

Just to recap that, if you set ENV_VAR then it’s accessible to AutoDevOps only and nothing else. If you set K8S_SECRET_ENV_VAR then it’s accessible to your application and not AutoDevOps.

EXCEPT…

I’ve found at least one situation where the documented behavior (as described above) is not the actual behavior: DATABASE_URL.

The AutoDevOps configs allow you to disable their automatic Postgres deployment, which is great if you want to, or need to, manage your own database rather than letting AutoDevOps do so. So you think, cool, no problem, I’ll disable Postgres from being deployed alongside my application and I’ll set K8S_SECRET_DATABASE_URL to my actual database URL.

WRONG. That variable you set will never make it to the pod, and you won’t know why. It’ll drive you mad because nothing in the documentation will tell you why this one variable doesn’t make it to the pod environment. But the fact this is likely the only variable that doesn’t gives you a clue about what the problem might be.

The Auto Deploy job dynamically builds DATABASE_URL for you even if you have database deployment disabled. So no matter what value you put into K8S_SECRET_DATABASE_URL, Auto Deploy will use the value it generates on your behalf. So how do you get around this? You set DATABASE_URL. That’s right, if DATABASE_URL is already present as a CI/CD Variable, then it won’t build a new URL and will just pass the existing value along to the final secrets file.

Speaking of databases

Migrations. Those things that let you incrementally change your database schema. You almost definitely use them and guess what? If you use Elixir, then you probably use mix ecto.migrate to execute new migrations. And guess what else? Releases don’t include Mix tasks such as ecto.migrate. Uh-oh.

Thankfully, someone else seems to have had this problem and figured out a good solution. Sophie DeBenedetto from Flatiron Labs has an article on how to run Ecto migrations in production. I won’t reiterate the entire content here, but the short version is:

  1. Although Mix tasks aren’t available, the migration task uses Ecto.Migrator to perform migrations which is available in a release.
  2. You can tell Distillery to set pre/post hooks for various stages of your application, such as “post-start” which executes some shell commands after the application is booted up.
  3. You can use bin/my_app rpc "Some.Elixir.Code()" to execute arbitrary code within your application environment.

When we put that altogether, we create a shell script that uses the rpc command to execute a new function in your application that calls Ecto.Migrator and finally hook that into the post-start process. Done!

I will admit, though, that I’m not a big fan of this approach because every time a pod starts, a release boots up, and the pre-hook is called. Thankfully, migrations are versioned and so you shouldn’t run into an issue where the same migration is run multiple times. After the first pod with the new migration is booted up, the migrations run, and all other pods after that will just check the latest version and determine nothing needs to happen. However, if two pods come up around the same time, you might get errors from one of them, but migrations are run in transactions so the first transaction to finish should succeed and any other transactions containing migrations should fail after that, which is actually acceptable just not ideal. In an ideal world, the migration would run one time, when a new version is deployed, not when a pod starts up.

Health Checks

One of the big deals with automated infrastructure is ensuring that services are “healthy”, booting up new instances to replace “unhealthy” ones. Kubernetes offers different methods for health checks but the Auto Deploy phase will automatically set one up for you: a simple HTTP request to the root path of your expected web application.

On the surface, seems like a decent default. Make a GET / request to the web server and call it healthy if you get a 2xx response. However, the root page might take some time to load up. Or you might not want the health check going off every 5 seconds. Or any number of other possibilities as to why you might want to change at least one small part of the health check configuration.

I recently stumbled upon another issue where I couldn’t upload images to my app because the default maximum body size was too small to accept reasonably-sized JPG files. That’s a core function of my app and I needed to get that increased regardless of how. The answer, I found out, also contains the answer to the question “how do I customize my health checks?” Specifically, by supplying “extra arguments” to the Helm chart.

To view the Helm chart for the Auto Deploy job, see the source repo and specifically values.yaml to get an idea for what portions of the deployment you can configure with your own values.yaml file. Specifically for health checking you want to look at readinessProbe, which is the check that determines when your application (not the pod, but your application code) has completed boot-up and is “ready” to accept requests, and livenessProbe, which is the recurring health check that ensures your application instance is healthy and will boot up replacement pods if any become unhealthy.

Conclusion

How do I like AutoDevOps so far? It is extremely frustrating to get it working using an “unsupported” language. But once you know the little details that make it all click together, it’s quite impressive. The only real downside at the moment is how long it takes. I typically wait around 15-20 minutes to see if a build makes it successfully to my staging environment. That’s because the build process seems to have little or no caching and the test phase takes a very long time. And by “test phase” I mean the Auto Test job, not the test suite itself; my own tests take less than 10 seconds to run when it’s all-green.

Still, AutoDevOps is working for me right now and I’ll continue to use it until such a time that I absolutely can’t do without more customization to the pipeline.