One of the key things we learnt while re-developing our load testing suite: Loadzen, was that being able to continuously deploy our code to live would be awesome.

It did come with quite a few challenges:

  1. Tests would need to be solid
  2. How do we reduce downtime but not waste money?
  3. How do we manage configuratons effectively without adding complexity?

I’d like to talk about the last point here, it’s become popular to either lump in and deploy to PaaS providers like Heroku, or to manage your own deployment infrastructure, and when you opt for managing your own infrastructure, you eventually end up setting up a server that is running Puppet Master, Chef or a Salt Master. More than likely, you will also be using Fabric to run commands on your servers remotely.

While this tech is great for companies with a deeper pockets to spend on infrastructure, it is a real drawback to need to run management servers for your configuration management!

With Loadzen, we opted to go for an independent, or stackless deployment approach which meant that each server should be able to bootstrap it’s configuration with a minimum of external influence, and that we should embrace virtualisation completely, and that meant treating servers as throw-away CPU’s that in an of themselves did not matter.

The way we managed this is with a combination of features:

  1. We use AWS’ user-data feature to feed init information into base images
  2. We use a standalone salt-minion on the servers to bootstrap configurations
  3. Configurations are stored in github, independently from the application
  4. Salt configurations are stored in gihub, independently from the code

Deployment is handled by a single script – whose sole purpose is to coordinate the starting up and bringing down of instances, with zero interaction with the instances themselves, this script is stored alongside our codebase.

Our base instances are pre-seeded, either with the bare minimum init scripts (we can create a base box with a single command) or with a more advanced dependency set to speed up boot time.

When an instance boots it will go through some very basic motions:

  1. Get it’s use data from AWS
  2. Set up it’s minion-id using the user-data (a simple echo command)
  3. Bootstap salt (if not already seeded)
  4. Pull the salt files from github
  5. Call highstate
  6. Set up a series of file watchers to restart services when configurations change
  7. Set up a cron job to update system configurations periodically from github

This bootstrap process gives us a lot of control during deployment, by feeding user-data to booted instances, we can essentially pre-assign minion ID’s, by assigning these we can effectively create multiple architectures by defining them in the top.sls as deployment targets. So if we want to create a fully decoupled multi-service cluster of our application, we simply define the salt states and dependencies in our salt configuration and start the servers respectively, they will automatically bootstrap the correct services and hook into our RMQ core.

This means if we need to, we can run single-instance versions of our application, load balanced versions or decoupled versions with more complex infrastructure without having to run or manage a configuration master server.

As our configuration files are stored separately from our app, we wanted a feature similar to puppet’s reload feature, where the daemon would monitor file states and restart services if those files ever updated. It turns out, you don’t need a full-on puppet install to pull this off, but only a little set of tools called inotify.

We wrote a few scripts that monitor specific configuration files, and in turn run supervisord commands to restart them if those configurations change. This mean we can update swathes of the system (like launching a new load generator base AMI) without needing to re-deploy the whole site.

Thi post started off about continuous deployment – so I guess I better get to the point.

What this set up means is that we can use a single deploy script, that doesn’t need to interact with the individual servers to handle setting up a new instance, in particular we make heavy use of AWS’ ELB features and have designed our application to be stateless (as much as possible) so that it can handle being load balanced if necessary.

Our deployment script basically runs through a few simple operations:

  1. Boot the necessary AWS AMI and pass through it’s minion tag
  2. Monitor if the instance has booted
  3. Monitor if the instance has bootstrapped (e.g. is the web server running and returning 200 responses)
  4. Associate the instance with our load balancer
  5. Monitor the load balancer to wait until the instance(s) have been registered
  6. Once the load balancer registers the instance(s), terminate the old instance(s)
  7. Dissasociate the old instance(s) from the load balancer
  8. If any of the steps fail, terminate and de-associate the new instance(s)

We get our CI server to run the deployment command when a master build clears, which means we can push code simply with a merge and push.

There’s a lot of moving parts to this, all for the single goal of not wanting to run, maintain and configure a puppet or salt master. However it has given us a lot of control we can hot-load configurations by editing them directly on github if we need too (not recommended), we can still integrate and test code that isn’t in master, in fact we can push those branches to a staging environment for wider testing.

In the end though, we feel it is worthwhile to keep our deployment and configuration set up as simple and easy to understand as possible, without introducing new and unpredictable systems into the mix.