Cloud Native ComputingDevelopersDevOpsFeaturedPredictionsSREs

Role Of SLOs In Technical Debt And Talent Retention | Kit Merker


Guest: Kit Merker (LinkedIn)
Company: Nobl9 (Twitter)
Show: 2022 Prediction Series

Kit Merker, Chief Operating Officer at Nobl9, predicts that SLOs, or service level objectives, will cause a major rethink on how technical debt is approached. “SLOs and technical debt actually go hand in hand. We’ve built some capabilities in our product to help solve this problem because right now, tech debt is seen, I think, as a punishment for engineering teams or something that maybe is crying wolf. We need to fix it. We don’t spend time on it,” explains Merker.

He also believes that “it’s going to be a lot harder to retain and recruit engineers and developers if you don’t handle your software reliability well.” Check out his video above to know more.


Swapnil Bhartiya: Hi. This is your host Swapnil Bhartiya and welcome to our 2022 prediction series. We have with us once again, Kit Merker, Chief Operating Officer at Nobl9. Kit, it’s good to have you on the show.

Kit Merker: Great to be here.

Swapnil Bhartiya: Before I ask you to grab a crystal ball and share predictions, please, could you tell us what Nobl9 is all about?

Kit Merker: Sure thing. Nobl9 is a software reliability platform focused on service level objectives, or SLOs. Really, what we’re here to do is to help companies find the right level of reliability good enough to make sure that customers keep coming back and using their banking services, or streaming videos or retail site, but with a gap to perfection. Software is not perfect and so we help people manage that gap. That’s really where the innovation comes from and that’s where the profitability comes from, so that’s what Nobl9 is, is a platform helping all kinds of different digital companies serve their customers and also keep their employees productive and engaged.

Swapnil Bhartiya: Excellent. Now it’s time for you to pick up your crystal ball and share with us what predictions you have.

Kit Merker: Well, I’ve been thinking a lot about the predictions, so I’m going to start with my first prediction and it’s going to be all about SLOs. First prediction is that SLOs will cause a major rethink on how tech debt is approached. When you think about tech debt, or technical debt, what this is is the issues, or errors or bugs that are deferred, the things that either drag down the team, or maybe aren’t required to serve customers today, but eventually will bite you. SLOs and technical debt actually go hand in hand. We’ve built some capabilities in our product to help solve this problem because right now, tech debt is seen, I think, as a punishment for engineering teams or something that maybe is crying wolf. We need to fix it. We don’t spend time on it.

By using this service level objective approach, it’s much easier to tie the business impact of technical debt. This is really emerging as a new approach to how companies and software development teams think about their backlog and how they bring to the fore, the technical debt issues that have a real customer impact and to further things that don’t. My first prediction is that this SLO approach is going to make tech debt management a lot easier.

My second prediction is that it’s going to be a lot harder to retain and recruit engineers and developers if you don’t handle your software reliability well. What I mean by this is there’s this great resignation happening. I think everybody’s heard about this and in the tech market in particular, engineers have had to deal with scaling up to demands during the pandemic and as softwares become more prevalent in our lives. Whether that’s food delivery, entertainment through streaming services, et cetera, all of these large scale services have real engineers behind the scenes that are giving up their evenings and weekends to on-call.

When those engineers have a choice, they can join different companies. They’re not necessarily going to the office, so the perks and the office space aren’t really a differentiator. What we’re seeing and I’m predicting is that engineers and developers are going to choose to work for companies that have an appropriate strategy for managing reliability, incidents on call and technical debt because that is a major factor in their lives. That, I think, is going to grow and continue to grow as companies need to continue to recruit engineers and to become critical to delivering digital services.

My third prediction is that the metrics we use to create SLOs, or service level objectives, will expand. When we think about SLOs traditionally, oftentimes it’s thought of as part of the infrastructure. It’s the servers, it’s the cloud, it’s the CPUs. We’re trying to prevent infrastructure issues on the back end. We’re trying to find out where, our service is going to break, where the data base might fall over, or the Kubernetes cluster might fail. What we’re seeing and what I’m predicting is that over time, the SLO data is going to come from a variety of places. For example, think about end to end business processes. Maybe you’re a bank and you care about the mortgage process end to end. That might touch dozens of systems, both internally and external. It might touch manual processes. It might touch the customer who’s collecting documents and has milestones they need to meet.

That entire process can be measured with SLOs, from a variety of data sources. Think about SLOs for things like security issues. Maybe you have a fleet of servers that need to get patched because a log for shell, right? You want to know what percentage of them have been patched by certain days. Each of these different use cases, whether it’s customer-facing, whether it’s security related, whether it’s about a cloud migration, this is where the service level objective approach is going to expand and we’re going to see more and more systems providing data that can be measured as these SLOs.

Then my fourth and last prediction is that the roles, or people who are going to be using SLOs is going to expand and grow. Traditionally, we’ve thought of SLOs as a tool for site reliability engineering. SREs are the main people who are promoting SLOs. Google made a book about it. There’s a lot of people who have had this site reliability engineering mindset that have adopted SLOs. Well, what we’re seeing in the market now and what I think is going to continue is that product managers, traditional app dev platform engineers are all caring about SLOs a lot more.

There’s a big movement right now in FinOps where financially-minded people who are thinking about the cost of cloud and they want to add that context. It’s not just about driving down the cost of cloud, but it’s also about making sure that the service is efficient on a unit basis. We’re serving our customers. What does it cost us to serve them and managing that cost, not just trying to spend less, but to spend an appropriate amount based on the service we’re providing to customers, that they’re getting the right level of latency, availability, et cetera.

As product managers are thinking more and more about SaaS services and APIs as products, they need to have a way of measuring and communicating the reliability expectations of their service, and so service level objectives fit into these different roles. It’s going to go far beyond this. In fact, there’s even a role we talk about called the [slowgician 00:05:56], which was coined by John Wilkes in a white paper he wrote about SLOs. The idea of a slowgician is somebody who their entire focus is on defining the math behind how a service should operate in different conditions. That’s my last prediction, roles using SLOs are going to expand and I think we’re already seeing it, but I think it’s going to go even further.

Swapnil Bhartiya: Excellent. Thanks for sharing this prediction. Now, what is going to be the focus of the company in 2022 based on these predictions as well?

Kit Merker: Nobl9 has been expanding and growing quite a bit in the last year. We more than doubled the size of our employees. We went to market with a couple new products. We’ve been delivering new features and now we’re seeing this incredible demand org organizations adopting SLOs, and so our main focus this year is going to be to service those customers, make sure that they’re getting the features that they need on the roadmap and continuing to innovate, and then expanding into new areas. We are trying to meet more use cases beyond the traditional SRE reliability use cases and thinking about things that impact businesses like I was talking about before, so the cost to serve, the speed of innovation, managing that productivity and work life balance for engineers so they stay on board. Those use cases are really important.

We’re also going to be expanding the ease of use. We’ve added some new AI-based capabilities to predict what SLOs you need. That’ll be coming out soon. We have some customers beta testing that now. We have some easier integrations and more data sources, just kind of ever expanding that workflow. Then fitting more and more into the CICD, the continuous integration, continuous delivery, so making it easy for people to do things like progressive releases based on SLOs, or auto scaling basis on SLOs, or checking the SLOs into source control and having to be part of that CICD process. These are all the feature capabilities we’re doing. Then I think the other big thing for us this year is going to be about expanding to education. A year ago, people didn’t really know what SLOs are over the last year. We’ve seen more and more people becoming aware of this powerful technique and we’re going to continue to expand that educational practice this year as well. We’re going to be busy. We’ll put it that way.

Swapnil Bhartiya: Excellent, Kit. Thank you so much for sharing these predictions, these insights, and of course telling more about the company. I look forward to talking to you again soon, but hopefully next year. Thank you.

Kit Merker: Thank you so much.