An interview with Grant Schofield, Director of Infrastructure at Humio, a log aggregation platform similar to Splunk or Elastic Search.
With more and more companies adopting Kubernetes, it’s no longer just about creating a logging solution, but a good logging solution that integrates well with Kubernetes.
Humio, Schofield said, is a “log, everything, answer anything” platform that is very scalable and able to ingest several terabytes of data on a single server across up to 25 nodes. They use Kubernetes to send the logs into where the Kubernetes cluster is running applications.
While Humio is both on-premises and in the cloud, Schofield has been experimenting with expanding the limits of scale. Running on Google’s GKE, with the GKE’s quota of 2000 CPUs and 50 SSDs, he was able to generate a hundred terabytes of logs a day and ingest them in, query them at sub-second latencies, and has sustained that level over several days.
Observability is a Cultural Movement
It’s not just one tool or an app, said Schofield. It’s a cultural movement. The three pillars of observability are logging, tracing, and metrics and Humio covers the logging leg. “We need an overall view of our systems logs,” he said. Logs tell us one story, metrics tell us another and tracing yet another.
Combining the three and bringing observability fully into your organization is not just about ingesting data at that scale, he said. It’s also about speed, making the data useable, and giving engineers the ability to interact with it. At scale. But in the end, it’s really about servicing your customer.
“Observability to me is about how are my systems running and how is that affecting my customers,” said Schofield.
Because Humio doesn’t index their data, alerts, queries, and dashboards are live with zero latency. So it is able to provide live observability on a 100 terabytes a day, which is approximately four gigabytes of data per second.
The key, said Schofield, is to log everything because sampling data doesn’t always give you all the information you need. “If we don’t have all this data, we can’t really service our customers in particular,” he said.
At his former company, a video streaming service, they found that a specific Brazillian ISP had a really, really slow DNS responses. While less than 0.04% of their customers were affected, it was a crucial issue for those customers and the issue only surfaced with full observability in place.
Listen in to hear more about what Humio can do, why Schofield photographs pelicans, and the intersection between software engineering and musicians.
By TC Currie