Cloud Native ComputingContributory Expert VoicesDevelopersDevOpsObservability /Monitoring

Why eBPF Isn’t Enough for Container Monitoring


Author: Phil Gervasi, Director of Technical Evangelism, Kentik
Bio: Phil is a veteran engineer with over a decade of experience in the field troubleshooting, building, and designing networks. Phil is also an avid blogger and podcaster with an interest in emerging technology and making the complex easy to understand.

eBPF is undoubtedly an effective technology for observability and, specifically, monitoring activity in a Linux kernel. It enables custom programmatic manipulation and observation of system behavior without changing kernel source code or loading kernel modules. With eBPF, we can perform safe and efficient tracing and profiling of the kernel and user-space applications. It’s a powerful tool as part of a network observability strategy.

However, Linux hosts, whether they are containers or traditional servers, don’t exist as an island. They, or more specifically the workloads they run, rely on a multitude of other services.  When we consider how applications are built and delivered in the context of microservices architectures, eBPF isn’t enough for deep and exhaustive container network monitoring.

Limitations of eBPF

Though eBPF has two types of probes, Kernel and userspace, eBPF primarily provides visibility into kernel-level events. While this is important, especially concerning networking, it doesn’t offer complete visibility into application-specific activity. Seeing every syscall is crucial for understanding how containers are running, but looking at this alone doesn’t provide enough for us to understand container activity among entire pods in the context of a specific application.

We have the raw data with eBPF, but we still need additional tooling to aggregate this data, correlate it with other sources, and ultimately present it in a digestible manner. This doesn’t preclude the effectiveness of eBPF in getting that raw data. However, it does mean we still need something else to truly understand why an application built on a container architecture is performing the way it is.

The applications running inside of containers rely on DNS, make database calls, make local network connections and likely network connections over the public internet. To determine container performance and ultimately application performance, we need to look at more.

Container ecosystems often consist of very different components, including orchestration tools (like Kubernetes), storage, public cloud, and other services. eBPF doesn’t provide holistic visibility across all these components; therefore, we need additional tooling. eBPF is powerful specifically with regard to host-level visibility, but again, it isn’t enough.

Also, remember that if your container workloads span multiple operating systems, different kernel versions and distributions, or require portability across different environments, relying solely on eBPF could be limiting.

Next, though eBPF is excellent for real-time monitoring since we’re looking at kernel events as they’re happening, it doesn’t inherently store historical data. That’s not a drawback of eBPF per se, but it does mean additional tooling and storage solutions are necessary for retaining and analyzing historical container networking data.

Lastly, eBPF primarily operates at the kernel level and does not have insight into application logs. The whole point of what we’re doing here is building and delivering applications to end-users, so without that application-level awareness and insight, we’re missing the full context of the granular events eBPF is reporting.

Augment eBPF for effective container monitoring

To augment eBPF monitoring and provide comprehensive observability of container networking, we need to combine eBPF’s capabilities with other telemetry, tools, and practices.

We can enrich the results of eBPF monitoring with data sources such as application logs, system metrics, application and process IDs, and service traces. This provides a holistic view of the microservices environment and puts eBPF metrics in the context of an application.

Service meshes, in particular, can provide additional telemetry, security, and control over containerized applications. When combined with eBPF data, you get both low-level network insights and high-level application insights.

DNS is a critical component of application delivery, so ingesting information about resolution times, DNS load balancing, and nameserver response times can also augment what we learn from eBPF alone.

We also want to gather topology information because most of these container architectures exist within a hybrid environment, which isn’t visible when monitoring only at the server level, which is the case with eBPF. The Kubernetes API allows us to ingest all of the Kubernetes metadata, including IP addresses, etc.

The need for comprehensive Network Observability

Ultimately, we need the rich context of the entire environment as well as business information to provide true comprehensive observability. Data from an IPAM, CRM, public cloud logs, etc., are all necessary to understand what’s going on with container networking. Otherwise, latency among pods is meaningless. Asynchronous routing between clusters may be important, but it may not be because we just don’t see the big picture with eBPF alone.

eBPF is absolutely a critical tool for container network monitoring, but it’s part of an overall strategy. Alone, eBPF isn’t enough for substantive and comprehensive container network monitoring. Only when we utilize an approach leveraging eBPF along with other telemetry, tools, and processes can we properly monitor container network activity.

Join us at KubeCon + CloudNativeCon North America this November 6 – 9 in Chicago for more on Kubernetes and the cloud-native ecosystem.