In September, Istio released a new mode for deploying a service mesh without sidecars called Ambient Mesh. This mode continues to provide the observability, security, and traffic management that Istio users expect, but through a less invasive method and reduced resource overhead. This new launch comes at the perfect time: with Istio’s recent acceptance in the CNCF, Ambient Mesh’s reduced friction will open up service mesh to a broader set of users, making Istio more accessible than ever before. This article answers a number of common questions around ambient mesh.
1) Why was ambient mesh created?
Service mesh is a powerful mechanism for providing observability, security, and traffic management for workloads in Kubernetes. Like in many networking systems, a service mesh typically consists of a data plane, which is responsible for forwarding traffic, and a control plane, which takes a user’s configuration and programs the data plane. Istio uses an Envoy proxy to implement the data plane that, traditionally, has been deployed as a sidecar – that is, it runs in a separate container within each workload’s pod.
Istio’s traditional model deploys Envoy proxies as sidecars within the workloads’ pods.
Sidecars have significant advantages over custom libraries, since they provide a mechanism to get the benefits of a service mesh without modifying the underlying application. However, they have a few issues that prevent them from being transparent:
- Invasive: Workloads need to be restarted to install or upgrade them, which can be disruptive to some workloads.
- Underutilized Resources: The Kubernetes model requires making reservations at pod creation time, so they tend to be overprovisioned. This leads to underutilization of resources across the cluster.
- Traffic Breaking: Istio configures the proxies to do full HTTP processing, which is expensive and can break applications with non-conformant HTTP implementations.
Istio wanted to provide an alternative way to deploy service mesh that would allow users to feel confident that they could safely enable Istio and get its benefit without worrying about breaking their applications or under-utilizing their resources. This required a new approach to providing the Istio data plane that is called ambient mesh.
2) What does a deployment of ambient mesh look like?
Instead of deploying sidecars, ambient mesh installs an agent in each node called a ztunnel (zero trust tunnel). The ztunnel’s primary responsibility is to create a secure overlay that authenticates and encrypts all traffic between elements in the mesh. The node’s networking stack redirects all traffic in and out of a workload enrolled in the mesh through the ztunnel agent.
Ambient mesh uses a shared, per-node ztunnel to provide a zero-trust secure overlay.
When L7 features such as HTTP traffic management or authorization policies are configured in a namespace, an Envoy instance is deployed as a regular pod in the cluster, which is called a waypoint proxy. All enmeshed workloads that connect to a namespace with L7 features enabled will be required to transit the waypoint proxy, which does full L7 processing and policy enforcement before forwarding it to the destination’s ztunnel.
When additional features are needed, ambient mesh deploys waypoint proxies, which ztunnels connect through for policy enforcement.
3) Ambient mesh makes a clear distinction between the secure overlay and waypoints. Why was this introduced?
By slicing the mesh into two layers, ambient mesh allows users to adopt Istio in an incremental fashion. The base functionality that most users want for all their workloads is a zero trust secure overlay, which ambient mesh can provide without disturbing running applications or risking traffic breakage. Only when applications require L7 features do the resource-intensive waypoint proxies need to be deployed to handle L7 processing.
4) Doesn’t this introduce extra hops that will slow down traffic?
Ambient mesh is expected to have similar or better latency than sidecars. Most of the latency introduced by using service mesh is from L7 processing and not from transiting the network. Since the ztunnels don’t parse the data they’re carrying, negligible overhead is expected from the secure overlay. When L7 features are enabled, ambient mesh tries to apply policies only on the producer side’s waypoint proxy, which removes a full L7 processing step versus sidecars.
5) Why not just use a shared node proxy that does everything?
While ambient mesh uses a shared resource in the ztunnel, it is limited by design to only handle L4 features and provide a secure overlay. Ambient mesh doesn’t use a shared L7 proxy for a number of reasons:
- Envoy isn’t designed for multi-tenancy, which lead to concerns around commingling complex L7 processing for all workloads on the node. By constraining the ztunnel to L4 processing, the vulnerability surface area is significantly reduced.
- That constrained scope of ztunnel reduces the amount of CPU and memory resources required, which make it more amenable to sharing. By moving complex L7 processing to the waypoint proxy, that expense can be properly attributed to the service accounts using it, and provide the flexibility to scale them based on dynamic needs.
- The limited functionality of ztunnel allows it to be replaced by alternative implementations that meet a well-defined interoperability contract. One could imagine building alternative versions using Rust or running natively in the kernel or offloading encryption to the NIC.
6) What are the security implications of this architecture?
Ambient mesh’s security model is quite different from sidecars. In sidecars, the proxy only has keys for its paired application, but the proxy vulnerability surface area is larger due to the full L7 processing that takes place. Ambient mesh uses a shared node-level proxy that contains the identities for all the enmeshed workloads on the node, but with a smaller vulnerability surface area due to its more limited processing. Full L7 processing only occurs on the waypoint proxies, which are only shared by workloads with the same service account, making them no worse than sidecars today. This is a fairly complex topic, which is covered in greater detail in the Ambient Mesh Security Deep Dive blog.
7) Does this mean sidecars will go away? Why might someone choose to stick with sidecars?
Ambient is being offered as an additional way to bring workloads into the mesh. While the benefits of ambient will likely make it the best fit for most new users of Istio, sidecars will continue to be supported and, in fact, will be the preferred method in some environments. For example, applications that have well-defined resource requirements or regulated environments that are designed around its security model.
8) What happens in an g kubecon schedenvironment that uses ambient mesh, but wants to have some workloads use sidecars?
Ambient mesh is able to interoperate fully with sidecars. In fact, even within an ambient-enabled namespace, individual workloads can use sidecars, and the policies and communication will work seamlessly.
9) What is HBONE and why is it important?
To encrypt traffic, Istio traditionally upgraded the connections between workloads with mTLS. This approach was fairly fragile, broke some applications, and required complicated sniffing logic to distinguish unencrypted from encrypted data in the proxy. Istio is now introducing a new mechanism to connect workloads called HBONE (HTTP-Based Overlay Network Environment), which uses HTTP CONNECT over mTLS to implement secure tunnels.
In addition to representing an improvement for traditional Istio sidecar users, HBONE’s tunneling is key to allowing waypoint proxies to be injected in the network path. HBONE will allow interoperability with common load-balancer infrastructure, and Istio plans to publish a standard to allow interoperability with other types of endpoints to allow them to securely integrate into the mesh.
10) What are the next steps for ambient mesh?
Ambient mesh is available to as an experimental feature with plans to launch for production use in 2023. The current focus is on bringing ambient mesh to feature- and scale-parity with sidecars. In addition, Istio is looking to add support for non-TCP protocols through the use of new HTTP CONNECT methods and alternative ztunnel implementations that may improve performance and security.
–Justin Pettit, Software Engineer at Google