Over at Tyk Cloud we’re constantly trying to figure out how we can make our server-load smaller, overall machine performance higher, and optimise the cost of our infrastructure overall, part of that is estimating just what is possible with Amazons available instance types, especially the chaeap ones.
The Tyk API Management stack makes good use of of golangs paralelisation features an lightweight goroutines, it also means that the process is very much CPU-bound, and performance increases significantly with every core you add to a machine.
Amazon AWS’ T2.Medium Instances are the lowest-cost dual core systems that you can spin up, but they are in the T2 class type, which means that a) they need to be launched inside a VPC and b) they make use of CPU-credits, or what Amazon term “burstable” performance. This basically means that while your CPU is idling, you accrue CPU credits, and then when your CPU needs more resources (>40%) for a heavy spike, it cashes in those credits for extra horsepower.
This is actually really useful, and means that for a service like Tyk Cloud you can build a really cost effective server eosystem that is cheap to run, can scale up (not only in CPU power), but auto-scale instance nodes as well. In short, it’s ideal for runnning an API gateway.
As part of an overall benchmarking excercise, we put the latest version of the tyk Open Source API Gateway node through it’s paces on both Digital Ocean and on these T2 instances, mainly because blitz seems to run out of Amazon’s viginia data center and secondly because we wanted to see how the two server types actually compared.
For reference, the Amazon servers have more RAM and are slightly more expensive, so we’re not going to try and directly compare the performance of the two. Instead, I wanted to show you wnhat hapens to your traffic performance when CPU bursting kicks in. We were really surprised when we saw these response time results for a 2000 r/ps test:
The test in the above scenario was a rush from 0 to 2000 concurrent users, this results in an average of about 950 requests per second throughout the test, and peaks at 2000 rps (see the next graph). What’s really interesting here though is that as you can see, response times across the test are actually super low, starting off below 10ms and then at the end near the 2k rps mark drifting up to just under 100ms. Not bad really, but what are those crazy spikes?
You guessed it – it’s a “stop-the-world” CPU scaling in action:
You can see in the graph above how hits per second plumet, while errors and timeouts grow. This isn’t because the process had stopped responding, if you look at the same test against a Digital Ocean server, you see a smooth response scale:
Now that’s what you would expect to see.
What I find interesting about the T2 instances is that they only start bursting if you break the 40% CPU utilisation limit, as soon as you grow beyond that the system will start enabing CPU bursting, and cause this behaviour in your trffic. Now in a load-balanced situation, this wouldn’t matter since the likelihood of both instances hitting burst volume at the same time is very low, which actually makes the T2 instances excellent value for money for a platform like ours, since the T2 instances handle exceptionally high load well, once you’ve hit the brick wall of CPU scaling.
Of course the counter argument is then that if we ran two DO servers in a load balanced configuration then we would get even smoother performance at a cheaer price. But then we also don’t get all ther really nifty free features of AWS such as the elastic load balancers an built-in auto-scaling, we may also run the risk of scaling too soon.
Questions, comments? blackmail? Feel free to join in the comments below.