We’ve been very busy at Jively HQ recently – working up the next release of the Loadzen platform. “Loadzen Reloaded” as we call it internally, is a complete ground-up rewrite of the original Loadzen platform, here the first incarnation of our site was built to ship as fast as possible (it took a year to complete end to end), this new version addressed all the things we realised we wanted to to do with the platform but couldn’t.
The old stack was:
- A django core to manage the database, content and UI element of the application
- A python server that used Pyro (an RPC server framework) that managed all the loadzen “Jobs”
- A python application that served as our load generator
- A micro-tornado app to handle websockets
- The front end was a horrific mish-mash of django templates, jquery, memcached hacks and SocketIO
Admittedly, having Django, it’s ORM and powerful templating engine made getting the site ready to ship very quick. Getting to grips with writing an RPC networked application back-end (no API, just the RPC interface) caused quite a bit of concern and generated some monolithic classes that should never have been brought into this world.
To put it simply, the application had many flaws, in particular making it very difficult to update and add new features in a sane way (feature changes needed to trace through from the UI, to the Django site, to the Job server to the load generator – tracing a python object through all that was a nightmare).
The core of the problem was the choice we made in using an RPC interface, it was wonderfult o interchangeably use python objects between the client and server, but it also meant that those objects had to be sane and had to be serialised and deserialised when it came to AJAX calls and anything front-end.
We took the step to rewrite from scratch mainly so we could evolve the platform faster. Which thankfully, we are now in a positions to do.
So, what did we change?
The web front end is pure Tornado – tornado as a framework is wonderful, by being bare-bones event-driven API it integrates well with websockets and AMQP. Furthermore, writing a RESTful API in it is incredibly straightforward, being able to write functionality for specific verbs and properly handle unsupported verbs was a cinch.
Going with tornado meant we needed to chose a new database, having come from the wonderful ORM world of Django, we initially wanted something quite similar, until we realised that all of our data is not that relational, and that we do not perform many joins across our data sets.
Given this, and our fear of having to work with raw SQL (that still worked with Tornado’s async pass-through style), we opted for MongoDB instead, it’s document-based store made things simple: Data modeled as JSON could be stored as-is, with no translation or serialisation/ de-serialisation (with a few exceptions).
To ensure data integrity between objects passed through our RESTFul-API and the DB, we were early adopters of JSONSchema – which has been becoming more popular and more widely used now. By writing JSON schema’s to represent our input we could not only validate data passed to us by an API client, but also ensure that what went into the database had passed some rigid tests before committing.
Putting RabbitMQ at the core of our application – loosely
We realised that we needed a robust event and messaging system for our application, and RMQ was the natural choice, with great python support and an excellent reputation, it provides a the central communications channel for all of our systems.
When making this decision, we were very careful to ensure that the messaging and communications layer was completely separate from the deeper application functionality. And also influenced our design decisions for the whole system quite heavily.
We took some inspiration from the original neckbeards of Un*x, and decided that small is beautiful. The new application actually consists of a series of core micro-services that each provide highly specialised functionality and can be brought up and down completely independently.
These micro-services are layered, with the core application completely separated from the messaging wrapper, this means we can actually end-to-end test the functionality of the module without worrying about the messaging infrastructure (which can be tested at the integration level).
So we have a tiered approach to the whole application, which is basically a network of micro-applications:
- Core application class
- Messaging class
- Messaging framework
Another source of inspiration came from the idea that messages should inform the applications internal state, which means that each of the core modules simply accept a data structure, operate on the section that is relevant to them and spit out another message that tells another component (or components) to do something. Again, JSON-Schema comes in handy to ensure properly formatting of data. The micro-applications should ideally be stateless (or should re-create their state very quickly purely based on message input).
This was a decision made once again for the benefit of testing, by ensuring that the classes were stateless, it made creating testable states for the application under various conditions as simple as passing the correct mock data into the class, completely independent without worrying about statefullness.
So, with our message based, micro-service, DRY and decoupled architecture – what else did we change?
We took full advantage of Pythons dynamic nature, basically – where possible – anything that completed an action would do so through a driver, and the drivers could be hot-loaded, this meant that it would become easy to develop easily testable functionality that would follow strict interface patterns and could be dropped into the application to extend it’s functionality.
Modules and drivers would have their configuration and metadata sections in their messaging protocol, which ensures that adding the functionality and acting on messages simply meant extending a JSON object. All of a sudden working with a single module or a single element of the application would only at worst affect a whole micro-application (which could be hot-reloaded) and at best only meant adding a new python module to a folder.
So, now the application is basically a REST API, and a bunch o micro-services that as a whole make up the Loadzen platform (for load balancing this is a godsend, as we can easily distribute the entire application across new infrastructure without any major configuration changes.
For the UI we went full-on MVC JS application, opting for Google’s AngularJS. It’s modularity, decoupled nature and re-usable features, as well as really powerful data binding abilities made for an excellent way for us to build an API client (another great feature of building everything as a service means that you can re-build clients and services without affecting anything else).
Key drawbacks of this approach to application design is that there is more boiler plate – particularly around service start / stop and messaging behaviour. We tried to minimise this by using our TykRMQ framework to handle the messaging between the application and Rabbit, but it still meant that writing initialisers for the servers could be repetitive.
Secondly, the nature of the transport you choose (e.g. we could have gone for 0MQ instead) adds a layer of complexity, creating input and output queues, ensuring they are bound, ensuring any callbacks and actions that are dependent on pre-requisites are run while ensuring separation of concerns can lead to uncertainty.
Testing – while modular testing is fine, at some point the entire chain needs to be tested, and integration testing is still a key point where we need to improve our tests. One option is to use a series of selennium tests to run integration tests, but this could significantly increase our CI run time, and therefore an unpopular option.
we’re big fans of CircleCI and our site is hosted on AWS, we make quite heavy use of the elastic load balancing features of AWS to ensure our site can be delivered and integrated continuously when we push to master without any downtime.
When it comes to building our servers we did quite a bit of research into what’s out there, and what caught our attention was this report from ThoughtWorks which talked about the difficulty of “maintaining the maintenance infrastructure”, i.e. if we were to use Chef or Puppet or Salt, did we want to run a master server to handle configurations? Did we want to add another system into the mix to configure our servers? Instead of adding more complexity to our deployment methodology, we’re using a stackless method where our configuration is managed via Git and our servers bootstrap completely independently without any management influence.
Our stackless deployment method is probably worthy of it’s own post, but at it’s core it means that we can minimise the number of servers we need to run, and cut out a key dependency in the deployment cycle.
everything is better when it is independent, and we’ve taken that philosophy through rom the core design decisions of our application classes, to our micro-service based infrastructure through to how we manage our infrastructure. By forcing everything to act as an independent actor, we reduce complexity while retaining testability and reliability.