Cloud Native ComputingDevelopersDevOpsFeaturedKubernetesLet's TalkObservability /MonitoringOpen SourceVideo

Importance Of Observability In Cloud-Native Environments | Scaling Infrastructure as Code


One of the key drivers for observability was that when running something in a container and the container ended, so did the observability. This same phenomena applied to automation, making it frustrating to troubleshoot and manage automation for a productive system. However, this has spurred on the capabilities of being able to track what is going on in systems on a job- by- job basis. This includes being able to track the system, see what results it produces, watch the system as it runs, and then collect information live from that system as it goes.

In the fifth episode of our six-episode series on scaling Infrastructure as Code (IaC), Swapnil Bhartiya sits down with Rob Hirschfeld, CEO and Co-Founder of RackN, to discuss the role observability plays in IaC and how it is helping operators better understand their systems. He goes into detail about how monitoring and logging fit in with observability and how they help to improve the health of systems. He goes into some of the challenges around observability and how they can be solved.

Key highlights from this video interview are:

  • One of the challenges with observability is when the container would end, so would the observability, which can be frustrating. From RackN’s perspective, observability in automation systems is being able to track what is going on on a job-by-job basis, seeing what results it produces, and then collecting information live from that system. He explains why it is a game changer for automation and for infrastructure.
  • Hirschfeld talks through how observability is helping operators with infrastructure as code such as operators being able to see in a cluster build process, SSH keys getting created, watching the Terraform system being applied and debug parameters. He explains why it is so critical to be able to see exactly what is being executed in terms of building good code and maintaining and troubleshooting automation.
  • Traceability with automation differs from traceability without automation in that when you are building development tools, you need to be able to instrument systems. Hirschfeld discusses how automation can take advantage of an underlying platform to collect information that you use to monitor things. He explains the benefits of building automation to be more observable.
  • Observability does not eliminate the need for monitoring or logging. Hirschfeld defines the difference between monitoring and logging. He tells us that both aspects play an important role in the overall observability of the system. He goes into detail about the collaborative aspect of observability and how it helps teams improve the system.
  • Cybersecurity continues to be a serious challenge and collecting operational data that can be sensitive and retaining it needs careful thought. While you need to ensure sensitive information is not exposed as part of logged operations, you still need to expose that sensitive information to troubleshoot. Hirschfeld describes the ways IaC helps people navigate this challenge.
  • Hirschfeld discusses the ways IaC helps with compliance, such as being able to tell if your systems can be subverted or if you are missing collecting critical information. He explains how infrastructure pipelines help with observability and compliance and how IaC ensures compliance in the process of collecting and observing data.
  • Observability is fundamentally a Day 2 statement since it helps operators run systems on an ongoing basis and the more observability you have, the better for maintaining those systems. Hirschfeld goes into details about the impact observability has on Day 2 operations.

Connect with Rob Hirschfeld (LinkedIn, Twitter)
Learn more about RackN (Twitter)

The summary of the show is written by Emily Nicholls.