Cloud Native ComputingContributory Expert VoicesDevelopersDevOps

Code-to-Cloud Visibility: A Key Framework for Cloud-Native Success


Organizations understand that observability is not a cutting-edge differentiator, but a basic ingredient for success. We’ve all seen that a lot can go wrong in today’s increasingly complex, hybrid IT systems, and being able to understand and remediate problems is essential. As more businesses reinvent themselves through software, a lack of observability into applications and infrastructure is equivalent to not tracking inventory, finances, or other critical business functions. In the cloud, we have greater access to telemetry data across infrastructure, applications and automation, and we’ve learned that the organizations who key into this benefit are not just accelerating their development practices, but accelerating in such an informed way that visibility has become their competitive advantage.

We’re witnessing the evolution of DevOps teams and observability practices including the adoption of ALMs, continuous integration and deployment, cloud infrastructure, on-call and SRE practices, compute and container orchestration systems and monitoring tools. These improvements have had an incredibly positive impact on the capabilities, reliability and development velocity of software and engineering groups, and teams must continue to innovate on this trajectory in order to keep up with the evolving needs of the business.

Visibility is a Feature

Whether we’re discussing web service performance, supply chains of poultry products, or anything else, successful enterprises have deep visibility into the fundamentals of their business. Successful software organizations can weave a thread from telemetry to features running in production and everywhere in-between. Their services are more reliable and easier to develop than those of their competitors.

OpenTelemetry allows service developers and operators to quickly and easily capture signals like distributed traces, metrics and logs (in development) from their applications and infrastructure. These tools are absolutely critical to making observability an effective practice, allowing teams to go beyond outages and focus on the end-to-end user experience.

There are countless examples of applications leveraging the power of code-to-cloud visibility to deliver a better product. However, while visualizing the data is where we tend to focus, a thought out strategy on how to collect and increase data quality is just as imperative.

Garbage In, Nothing Out

Observability tools are only as good as the data that they capture. No amount of filtering, processing, machine learning, or other techniques can compensate for data that is inconsistently structured, lacking correlations, locked-in to specific solutions, or simply missing. Regardless of how you choose to capture internal telemetry from your infrastructure and applications, there are three things worth considering:

  • Does instrumentation lock you into a particular observability model or backend processing system? Organizations often lean on proprietary agent(s) for collecting data, however these agents are tethered to the monitoring tool and have a life cycle outside of the team’s control. Breaking this connection, by using open source standards like OpenTelemetry and W3C Trace Context, allows organizations to evaluate and use multiple analytics solutions simultaneously. Individual teams can choose the tools of their choice and continue to see the entire stack, and developers and operators can create their own custom analytics processors.
  • Is instrumentation consistent across all engineering? Depending on your needs and where you’re capturing telemetry from, you will typically end up using a combination of host agents (like the OpenTelemetry Collector), language instrumentation agents (like OpenTelemetry-Java’s agent), or SDKs (like OpenTelemetry-Java). Each has their own use cases. However, it is critical that no matter how your data is captured, it is structured consistently, using the same semantic conventions and data models across all data types, data sources and data sinks.
  • Can instrumentation provide business logic in-stream? Data collection shouldn’t just be limited to capturing data from hosts and applications. A robust telemetry pipeline also enables the processing, filtering and enrichment of the data that passes through it.

Without considering the way data is collected, visualizations can not only lead to poor decision making, but also to missed opportunities. If you can’t see the issue, you can’t address it. Visibility is the opportunity to catch the unknown unknowns, and data ingestion is the starting point.

The Community Understands Data Quality

The importance of data ingestion and visibility have been made evidently clear by technical personnel at large and the CNCF community. Service developers and operators understand the limiting factors of poorly instrumented (and thus, poorly observable) systems, and the tremendous benefits of well instrumented systems — and they are working to address all of the above.

With this, Splunk has followed suit. As a founding member and one of the top contributors to the OpenTelemetry project, we believe that OpenTelemetry is more important now than ever. It has the ability to accelerate the implementation of robust observability and deliver amazing results with cloud-native applications at a time where digital experiences through mobile and web applications are more important than ever. Additionally, OpenTelemetry democratizes how data is collected from your infrastructure and application, offering business logic and a level of flexibility that actually allows enterprises to embrace their cloud transformation faster.

To top it off, OpenTelemetry is only getting stronger going into 2022. In addition to ongoing work on instrumentation, the upcoming metrics GA, and eventual logging beta, there are two projects that I want to highlight:

  • Network telemetry and general eBPF support is coming natively to the OpenTelemetry Collector, starting with a donation from Splunk / Flowmill. The initial release of these enhancements will be focused on capturing network data like DNS requests, network delays, etc., and associating these with endpoint metrics and distributed traces.
  • Google and various partners have donated SQLCommenter, which provides deep visibility into database operations, into OpenTelemetry. Information about database and query performance can be inserted into distributed traces or correlated with metrics and logs.

A Reflection of Your Team

The applications you build and support reflect the engineering team who built them. However, features alone don’t define an organization and their applications. This is why organizations must prioritize building code-to-cloud visibility. By turning to OpenTelemetry and cloud-native observability tools, organizations improve the quality of the data they use as well as the products they deliver.

Author: Morgan McLean Director of Product Management, Splunk and Co-founder of OpenCensus & OpenTelemetry
Bio: Morgan McLean is a director of product management at Splunk focused on the Splunk Observability Cloud, along with Splunk’s contributions to OpenTelemetry and the agent and ingestion unification between Splunk Observability Cloud and Splunk Enterprise. Additionally, he is the co-founder of OpenCensus and OpenTelemetry, now the second largest CNCF project behind only Kubernetes. Prior to Splunk, Morgan spent five years as a product manager at Google Cloud Platform working on DevOps and observability initiatives, along with over three years at Microsoft as a program manager designing and implementing e-commerce services. Morgan has a BASc in Engineering Physics and a BA in Economics from the University of British Columbia.

To hear more about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021