June 5, 2022•1,125 words
Microservices tend to be associated with service mesh, and service mesh has an ugly secret at it’s heart - the sidecar proxy. When we try to simplify something that is complex, the most elegant solution tends to be the best, and sidecar proxies are definitely not elegant. In this article we’ll discuss some alternatives and what the future for Service Mesh looks like.
Service Mesh as a concept has been with us since around 2018, as most may already know, service meshes emerged from the melange created by the proliferation of container orchestration systems, which in turn emerged because of the proliferation of containers, which started because of the emergence of the micro-service architecture as the preferred software service delivery method across most digitally-enabled businesses.
In short, a solution (near-idempotent and transportable software packages), led to complexity, which led to a solution, which led to more complexity, which landed us with service meshes.
This is certainly not a criticism, when complexity can be simplified through tooling it inevitably makes the advantages of the complexity more available to non-specialists. Which in turn means more people can benefit, and ultimately that leads to a benefit for the consumer.
This is the whole “standing on the shoulders of giants” trope, my favourite example is”AI”. It’s a deeply complex topic to master, but it has been made highly accessible due to black-box tooling such as TensorFlow, making a complex set of tools available to almost any software developer. Proliferating the benefits of this technology to everyone.
The thing is, service mesh was born to solve a complex problem, and the crux of the solution was to glue essentially duct-tape two containers together and shodilly weld their pipes together and then add some really solid software on top of it.
It’s pretty terrifying when you look under the hood, because in order for a side-car to work, it needs to rewrite a whole bunch of firewall rules of it’s mated-pair in order to capture outbound and inbound application traffic. Of course, all of this kludgery is in the interest of simplicity, and the ability to turn any service into a mesh-enabled one without modifying the application.
It’s a kludge, and an ugly one at that.
What's been amazing to watch though is the fact that this deep-seated ugliness that sits at the heart of something ultimately rather elegant, is finally being addressed by folks that are actually trying to make developer’s lives easier.
First off, Envoy introduced the XDS API specification, also known as control-plane-APIs. These discovery services enable “data planes” - such as Istio - and “control-planes” - such as Envoy - to use a common set of APIs to enable discovery, routing and addressing.
This API-set was introduced in order to commoditise the proxy (and indirectly) the side-car layer of the service mesh. Which is fantastic, as more competition breeds innovation, and so long as it centres around some common standards, everyone benefits.
A wonderful side-effect of the XDS APIs, is that they also mean that you probably may not need a side-car in your future service mesh.
We’re already seeing this in action, with the extremely popular gRPC framework (a framework and SDK primarily aimed at building microservices), adopting an XDS layer within the framework itself, enabling the application to offload all the side-car related functionality to the application framework, and bundling it all into a single homogenous family.
Given we are talking about gRPC here, it also means that versioning, compatibility issues and fallbacks are baked right into the framework. Which essentially makes integrating into a service-mesh a code-first problem that can be checked, tested, and verified at the SDLC level rather than via an integration test within an environment.
This framework-level integration of the XDS APIs may not provide all the functionality that a side-car might, but it does provide something extremely important: a real reduction in complexity.
In the ever-increasing melange of the kubernetes-microservices-service-mesh-industrial-complex, any reduction in systems complexity is a boon.
This is all well and good for green-field applications, unfortunately for third-party systems such as databases and message queues, as they are bought-in, a sidecar is the only option to play nice with your mesh. However - should XDS APIs truly standardise the service-mesh management and communications space, it wouldn’t be surprising to see even-more cloud-native applications baking some level of compatibility into their core offering.
Think of it this way - when we use a runtime like Java, or .Net, we are outsourcing complexity to the runtime for syscall interactions, so why can we not move one layer up into the networking stack and make that simpler too? There’s a definite need for better-equipped server-side frameworks to embrace the realities of the modern application stack.
From the userspace software level, to the kernel level - the next way to ditch that ugly sidecar has been enabled through a recent kernel module called the extended Berkeley Packet Filter.
eBPF, as it’s known in “the biz” basically enables a user-space piece of software (say, your application, or a linux service), to set traffic handling rules at the kernel level, and modify those rules in real-time. Essentially it means that packet-level decisions can be made much earlier.
Instead of traffic needing to pass through the chain of modules that eventually lead a decoded packet to userspace, that packet can be analysed, and a decision be made before any of that happens, protecting the upstream application, or simply routing the traffic to the correct process or service elsewhere.
This is extremely powerful, and has already been jumped on by software-defined-network stacks such as Cilium, where instead of a sidecar, a standardised daemon packaged with your application in your container can handle all routing, load balancing and network-level transactions formerly part of the side-car.
“But you’ve basically just bundled the sidecar into my container! I could do that with Envoy too!” you may say, and to an extent that is true. Except that in the eBPF scenario, the daemon is operating at the kernel level, and so the overhead of all this processing is far less than what to expect from a sidecar. It also completely removes the need for a complex network and firewall kludge, because your eBPF-enabled daemon is replacing your firewall altogether, making configuration and management much clearer than the hidden complexity a sidecar is based on.
We’re not quite there yet with the removal of the side-car, and as I said above - the sidecar will be necessary for legacy applications that do not converge around standardised APIs such as XDS, but I can definitely see a future where our software runtimes and frameworks begin to embrace the modern server-side application environment and embed them into their core.