Vivian Voss

Service Mesh — The Sidecar Tax

kubernetes architecture performance

The Invoice ■ Episode 19

"mTLS, observability, traffic management, zero-code retries. You need a service mesh."

Splendid. Let us examine what one is actually paying for.

A service mesh moves cross-cutting concerns (mTLS, retries, timeouts, traffic shifting, observability) out of application code and into a proxy that sits beside each pod. Istio, the archetype, launched in 2017 as a joint project of Google, IBM, and Lyft. It graduated within the CNCF in July 2023. In the 2024 CNCF Annual Survey, service-mesh adoption across respondents fell to 42 per cent, down from 50 per cent the year before. That is not a catastrophe. It is, however, the first full-year decline the category has ever posted. The industry is quietly reconsidering the deal.

The Complexity Invoice

Istio ships over a dozen primary custom resource definitions across three categories (traffic management, security, telemetry) and dozens more through its operator, telemetry plugins, Wasm extensions, and Gateway APIs. A minimally useful installation comprises:

  • A control plane (istiod) responsible for configuration distribution, certificate issuance, and xDS API serving to every sidecar
  • A per-pod sidecar (Envoy) injected into every workload, running a second container alongside the application
  • An ingress gateway at the cluster edge, usually another Envoy in a standalone pod
  • mTLS certificates rotated by istiod, distributed via SDS to each sidecar
  • Policy resources: PeerAuthentication, RequestAuthentication, AuthorizationPolicy
  • Telemetry bindings to send traces and metrics to external collectors
  • A platform team that knows what each of those does, how they interact, and how to debug any given failure mode

The CNCF's own reports describe Istio as mature, powerful, and "operationally demanding". The second adjective is the one to watch. Installing Istio in a fresh cluster takes a senior SRE about two days. Operating it for six months takes roughly 0.5 to 1.0 FTE, scaling upwards with cluster size. Debugging it at three in the morning is a skill one acquires by losing two nights of sleep and one customer.

The Latency Invoice

Every inter-service HTTP or gRPC call now traverses two Envoy proxies: the caller's sidecar, then the callee's sidecar. Adding two proxies to every request path means adding latency. How much is now, happily for the debate, well-measured.

The Request Path Doubles Without mesh Service A Service B direct call, 0 proxies With sidecar mesh Service A Envoy sidecar Envoy sidecar Service B Every internal call now traverses two proxies instead of none. Debugging doubles with it.

A 2025 peer-reviewed performance comparison from the DeepNess Lab (Performance Comparison of Service Mesh Frameworks: the mTLS Test Case) measured the overhead with mTLS enforced on otherwise identical workloads. The table below is, one regrets to say, unambiguous.

mTLS Latency Overhead vs Baseline — DeepNess Lab, 2025 Istio sidecar +166% Cilium +99% Linkerd +33% Istio ambient +8% Ambient mode shipped in Istio 1.23, August 2024, as a node-level ztunnel replacing per-pod sidecars. Which is, in effect, Istio's own public admission that sidecars had a problem optimisation alone could not solve.

The headline number (plus 166 per cent for Istio sidecar with mTLS) is surprising only to people who have never read the benchmark. Envoy is fast; two Envoys in the path plus TLS handshakes and certificate validation are not free. Linkerd's Rust-based linkerd2-proxy is measurably lighter because it was built for the job, not adapted to it. Ambient mode, introduced in Istio 1.23 in August 2024, replaces per-pod sidecars with a shared node-level ztunnel and produces dramatically less overhead. Ambient is, in polite summary, Istio's own public admission that the sidecar model had a problem it could not solve by optimisation alone.

A sidecar also costs memory. The Istio 1.24 performance documentation reports approximately 60 MB of RAM and 0.20 vCPU per Envoy sidecar at 1,000 HTTP RPS with 1 KB payloads. A cluster with 1,000 pods is therefore paying roughly 60 GB of RAM and 200 vCPU for the mesh before a single byte of application code has executed. Ambient ztunnels are smaller (approximately 12 MB RAM, 0.06 vCPU each) but one now also pays for waypoint proxies where L7 features are enabled. Either way, the total is non-zero. "Free" is a marketing word.

The Mesh Bill Before Your Code Runs Istio sidecar 60 MB RAM per pod 0.20 vCPU at 1,000 HTTP RPS Ambient ztunnel 12 MB RAM per node 0.06 vCPU plus waypoint proxies for L7 1,000-pod cluster 60 GB RAM for the mesh 200 vCPU before application code Source: Istio 1.24 performance & scalability documentation.

The Debugging Invoice

When the mesh works, it is invisible. When it does not, the request path has doubled and so has the attack surface for bugs. A 500 that arrives at the client might originate in:

  • The application code itself
  • The caller's Envoy (wrong upstream cluster, circuit breaker tripped)
  • The destination's Envoy (connection limits, bad cert rotation)
  • A mis-parsed VirtualService or DestinationRule
  • The mTLS trust chain (expired intermediate, wrong trust domain)
  • istiod failing to push updated configuration within the retry window
  • A Wasm plugin throwing an exception
  • A Kubernetes NetworkPolicy quietly dropping the packet

The distributed tracing one installed to understand the mesh is now required to understand the mesh. Troubleshooting skills become mesh-specific skills, which means they do not transfer and do not scale with engineer headcount in the obvious way.

The Honest Case For

In the interests of not selling a one-sided story: service meshes solve a real problem for a real set of operators. If one:

  • Runs more than roughly 100 microservices with cross-team ownership
  • Has strict compliance that mandates mTLS between every internal service
  • Operates across multiple clusters or multiple clouds with incompatible primitives
  • Needs uniform observability across polyglot services that cannot all ship an OpenTelemetry library

then the tax starts to pay for itself. Everyone else, which is most readers, is paying Google's architecture to solve problems a single load balancer and a sensible VPC already solved.

The Alternative

Direct HTTP or gRPC calls between services, over a network one already trusts. This is how the internet worked for three decades before sidecars existed. It was, one should note, a perfectly functional three decades.

mTLS terminated at a single ingress gateway (HAProxy, nginx, Envoy on its own, or whatever load balancer is already in the stack), because the VPC was a trust boundary before sidecars were a marketing category. Internal traffic over plaintext inside the VPC is fine for the vast majority of workloads, and mTLS between services is a compliance requirement for a minority of them, not an architectural necessity for all of them.

Tracing and metrics via an OpenTelemetry library linked into each service. OTel is language-agnostic, vendor-neutral, and roughly five lines of initialisation in most runtimes. It sends traces and metrics via OTLP to any collector. No proxy required.

Retries and timeouts in the client library. Go's http.Client, Rust's reqwest, Java's RestTemplate or OkHttp, Python's httpx, Node's undici: all of them ship configurable retries, timeouts, connection pools, and circuit breakers. The retry logic that a service mesh claims to provide "without code changes" is three lines of configuration in any mature client, and has been so since approximately 1995.

Authorisation at the application layer, because only the application knows what "this user may read this document" means. Delegating authorisation to a proxy is delegating it to a component that does not, on any reasonable reading, understand the data.

The Pattern

Service mesh is sold as "zero code changes". One gets that by paying:

  1. Two proxies of latency on every internal call, measurably more under mTLS
  2. A platform team of overhead to run istiod, gateways, policies, and upgrades
  3. A debugger's worth of new moving parts: VirtualService, DestinationRule, PeerAuthentication, Envoy configuration, trust chains, Wasm plugins

All to avoid writing retry logic that any mature HTTP client already provides in three lines of configuration.

The mesh was always, architecturally, a political solution to a technical problem. It existed because microservice teams did not trust each other's code, and a proxy in the middle was a way of enforcing cross-cutting concerns without convincing any one team to adopt them. The proxy became the architecture. The architecture became the operational cost centre. The cost centre produced Ambient Mode, which is the industry's second try at making sidecars not cost what sidecars cost.

Meanwhile, the original alternative (a library in each service, a trusted network below, and a single ingress gateway at the edge) has remained exactly what it has been since approximately 1995.

The side call was always there. One simply decided it wasn't enterprise enough.

Istio graduated CNCF July 2023. CNCF 2024 Survey: mesh adoption 42%, down from 50%. 2025 peer-reviewed benchmark: Istio sidecar +166% mTLS latency, Cilium +99%, Linkerd +33%, Istio Ambient +8%. 60 MB RAM per sidecar = 60 GB across 1,000 pods before code runs. Ambient is Istio's own admission that sidecars were a problem. The client retry library has shipped retries since the 1980s.