0

More organizations are implementing SLOs as a way of tying together how you allocate resources with things that affect the outcomes of the business. Traditionally, it was used predominantly by SREs and production engineers but that is now changing with more people seeing the benefits of SLOs and how they can help organizations focus their energy and resources where it is needed, enabling better efficiency and sustainability.

On this episode of TFiR Let’s Talk, Swapnil Bhartiya sits down with Kit Merker, Chief Operating Officer at Nobl9, to discuss the key trends in the SLO space and how it is evolving. He goes on to explain the role SLOs are playing in addressing talent gap problems and how it is helping to retain developers.

When discussing how SLOs can help organizations, Merker says, “It’s really more about the decision making of how you allocate resources.  We really do need to focus our resources on the most important things, as always true in business and probably more so right now.”

Key highlights from this video interview are:

  • The second SLOconf was held earlier this year, attracting 40% more attendees than the previous year.
  • Merker feels that awareness about SLOs is growing. To help the community with SLOs, Nobl9 set up OpenSLO, a declarative format for describing SLOs in code, and an open source project called SLODLC. He explains how these projects are helping people adopt SLOs and the key trends he is seeing.
  • People can incorrectly assume that SLOs connection to business is with a set of reports. Merker describes how SLOs are more about the decision making of how you allocate resources, such as how the unreliability of my services or performance issues affects the outcomes of the business. He gives real-life examples of how SLOs tie these together.
  • Merker explains how SLOs can help companies address some of the challenges associated with talent shortages. He feels that SLOs help organizations focus on their resources by defining goals, and prioritizing what matters to the business. He discusses how they can also help with developers’ work-life balance and working remotely.
  • Adopting SLOs can be daunting with organizations feeling like their architecture is not good enough or struggling to get started. However, Merker believes it is best to set the goals first and incrementally improve them rather than wait to upgrade technologies. He explains how SLODLC can be used to help get started and the benefits of SLOs.

Connect with Kit Merker (LinkedIn)
Learn more about Nobl9 (Twitter)

The summary of the show is written by Emily Nicholls.

[expander_maker]

Here is the automated and unedited transcript of the recording. Please note that the transcript has not been edited or reviewed. 

Swapnil Bhartiya: Hi, this is your host, Swapnil Bhartiya, and welcome to another episode of TFiR Let’s Talk. And today we have with us once again, Kit Merker, Chief Operating Officer at Nobl9. Kit, it’s great to have you back on the show.

Kit Merker: Nice to see you. Thanks for having me.

Swapnil Bhartiya: First of all, tell us a bit about SLOconf. If I’m not wrong, this was the second one. Is that correct? And you folks, I think they’re like what? 3,000 or more registrations there? So talk about the event.

Kit Merker: Yeah, sure. Yeah, SLOconf, we had our second event this year and it was like a 40% increase over last year. I don’t know the exact number of attendees. But the really cool part about SLOconf is we bring together so many industry experts. We actually produced nine hours of content from the community that we published as part of SLOconf, all through our open call for proposals. And really the event is unique also in the structure because there’s no schedule. There’s no keynote. There’s no time to show up. You can really kind of take it at your own pace. And so it was a global event. We had people from every continent except Antarctica, I believe, joining in and talking in our Slack space in a kind of asynchronous way. Could attend while you work, et cetera. So that was a really fun event. And of course, all the content from both years is available on YouTube. So, if you check out SLOconf.com, you can find all the content and learn about SLOs. And it’s content for everyone from beginners to experts, deep math stuff and different controversial opinions and everything like that.

Swapnil Bhartiya: Right. And if you want some presentations from Antarctica, you need to get the Linux because penguins are still there. They will be submitting some [inaudible 00:01:43].

Kit Merker: That’s exactly right.

Swapnil Bhartiya: Yeah. Now can you talk about… As you said, the format was very flexible. And I was talking to a lot, couple of folks, and as you already said, the content was really great. What were some of the trends that you saw? What I did see was the evolution of SLO, and folks are saying that it’s kind of becoming a primary kind of observably instrument as well. So talk about the trends that you’ve seen, where folks are looking at SLOs differently than they used to look at it earlier.

Kit Merker: Yeah. Well, I’ve been doing this SLO stuff now as part of Nobl9 since 2019. And of course, it’s a popular methodology for SREs, site reliability engineers at Google and production engineers at Facebook. But outside of that world, it’s less known. And I think one of the big things that’s changed for me personally is people I talked to used to say, “Well, what is an SLO? I don’t understand it.” To now saying, “How do I get this going?” Right? “How do I get this up and running?” And so now the questions have been more about implementation.

So a couple things that we’ve done in the community around SLOs: 1) We created a project called OpenSLO, which we launched last year, and that’s really a declarative format for describing SLOs in code. And SLOs code is a really cool part of this whole observability stack where you can check in the definitions. There’s no arguing about the definitions. So we launched that with Dynatrace and GitLab last year. And then this year, we had a lot of contributions from Red Hat and Sumo Logic, and it actually went 1.0. So it’s a stable version of OpenSLO.

And then we also created an open source project called SLODLC, the SLO development life cycle. And that is an open sourced framework or methodology, a set of templates and tools, to drive the project management around discovery and design and implementation of SLOs. And that’s also implementation agnostic. We created that project with partners like Cantino and Accenture and customers like Out Systems and others for the community ford. And we had such a great kind of participation from different companies that are taking a different approach on that. And so this sort of trend of adoption that we’ve seen where companies are seeing the future, which is about defining clear goals that is described in code for how their services should work, and organizations are shifting to being sort of in this service centric mindset where they’re now not thinking of as, “Oh, I have IT and I have business.” They’re now thinking of it as a digital stack that is a set of services. And the people and the computers come together to really deliver that to customers. And SLOs are right, I think, at the center of that movement, and people are seeing them as a very clear tool for that. And now the next question is like, “How do I get into implementation?” Which I think is, for us, very exciting. We’ll put it that way for Nobl9.

Swapnil Bhartiya: Right. And there are so many things that I want to talk about. Number one is you talk about IT and business. Are folks seeing the business value of SLOs? Or how do they really relate it or directly connect it with the business success teams as well?

Kit Merker: Yeah. Well, I think when people first learn about the SLO connection to business, I think what they imagine is a set of reports. They imagine the dollars for the cost of downtime and say, “Oh, well, the service wasn’t working perfectly; therefore tell me how many dollars we lost and how many customers.” And that’s actually not really the business connection with SLOs. It’s really more about the decision making of how you allocate resources. And in the times we’re in right now, we really do need to focus our resources on the most important things, as always true in business and probably more so right now.

But the kinds of things that you can look at is how does the unreliability of my services or the performance issues of my services actually affect the outcomes from the business? And those could be things like how fast is our feature delivery, if you think of it from an engineering perspective? It could be how painful is our work/life balance for our engineers who are getting paged and woken up and managing the operations? It could be a question of downtime impacting customer reputation and revenue? And all of those things, it’s not like you tie it together directly and see a business report. It’s not like… You don’t see it on the balance sheet. But when you run a business, you’re thinking about KPIs. You’re thinking about goals. You’re thinking about how people are spending their time and energy toward the business outcomes that happen to be whatever your priority is at the time. And SLOs really do tie that together because they’re encoding a set of customer expectations, trade-offs, and risks, into the application that don’t exist in any other way.

And by doing that, setting these very clear goals and saying, okay, look, people come into your website; 99% of the time, it needs to work within X amount of seconds for the website to load. Or if I have a support queue of tasks that an IT team needs to complete, we got to complete 99% of those tickets need to get closed within three days. Those are both SLOs that could be inscribed in code, and then you can lead to automated action, which is I think the other key thing in business; is that we don’t want to sit there watching, right? Oh, we got to keep an eye on everything. We want the system to alert us, to tell us, “Hey, you might have a risk that needs your attention.”

And so that shift, when people really get into that mindset about, “We don’t want to violate the expectations of our customers.” Right? It’s like you think about a restaurant and how many times does somebody check on your table? Right? How long does it take before you walk in, before you get seated? How long is the line at the host stand before you get a table assigned to you? All those things are parts of that customer experience. That’s all moved online. And so describing that to an employee, you can explain it to them. You go to a restaurant, and this is what it should be like for our customers. It’s a customer experience. When we do it in software, unfortunately, there’s no one to explain it to you yet. You have to describe it in code. And that’s really what SLOs code is letting you do.

Swapnil Bhartiya: Since you were talking about, you touched upon some of the cultural, internal company aspects, also [inaudible 00:07:42]. One more thing that is happening today is we cannot ignore it with the economy because of COVID and the war there. There are looming kind of tensions of potential recession. But yeah, before that also we are going through this phase. First of all, there is a shortage of talent already. Plus, because of the folks who are working from home, they don’t want to change the way they have been working when people are trying to bring them back. So a lot of mass resignation or those kind of things are happening. So a lot of movement is already happening, and this recession is also creating a lot of challenges. Do you think that SLOs can also help companies in kind of mitigate or address some of these challenges as well?

Kit Merker: Yeah. I think the short answer is yes. The way that, and I was kind of alluding to this before, the thing that’s really, I think, at a macro level, what people are doing in their organizations right now is they have to focus their energy. They have to focus their resources. And they can’t drop the ball on customer experience or say, “Okay, just because our stock got hammered, we’re going to stop burning the servers with the same efficiency or…” The expectation doesn’t necessarily change. And to your point, from a resignation perspective, tech talent is always going to be shorthanded and people are going to look for engineers are going to look for what they consider to be good places to work and safe places to work and stable places to work.

And so I think what SLOs bring to the table in this environment, one is helping you define what the goal is so people are on the same page. They can work on the same things. You can define, “It’s more important to have service A, maybe the one that does the payments processing or as the first look at the customer has, or other things that are critical to the business. This is the shape of that, and here’s the risk of it. And these other services over here are either less important, or we have more room for error, and we can put less energy on that.” That is an important thing. It lets you run with less servers. It lets you run with less people. It lets you focus your energy on the things that truly matter. So I think that’s a really critical piece and being able to run the same level of perception of your services. And for enterprises, their services are vast, everything from supply chain to web, to mobile, databases and all the different microservices and everything. It’s vast services. They don’t just turn off overnight. So that’s a key thing.

And then the other part of it is maintaining that work/life balance, the remote work, et cetera. SLOs play a role in that too, because engineers understand that they don’t want to be paged over nonsense. They want to be paged… And they don’t mind carrying the pager for legitimate issues, but what they won’t stand is seeing an operation that’s not well organized and that sends alerts that aren’t actionable. I like to say that observability without action is just storage. And this is really the key thing, is we’re thinking about cutting our storage bill, thinking about cutting our cloud bill, and thinking about keeping our engineers focused. That’s really what we’re talking about here, and SLOs play a very, I think, very important role in that.

Swapnil Bhartiya: One last question before we wrap this up is also that we are talking about the adoption of SLOs are seeing value. But can you also talk about when they do embrace it, what are some of the challenges that you see are common or some mistakes that they make? And then how would you suggest them to approach it in the right way so that they get the value that they should get out of it?

Kit Merker: Yeah. The thing that I see more than anything else is people are hesitant to get started with SLOs because they think their infrastructure’s not good enough or they haven’t adopted the most modern technology. And so therefore they don’t get started, say, “Oh, we’re still running on VMs.” And I go, “Well, that’s great. Most of the world is running on VMs. That’s not…” Don’t believe the hype that everyone’s on containers and serverless, right? And the impact that this has, and to use a colloquial expression, it’s like you’re too busy chasing pigs to build a fence. And this is what it feels like when you have a lot of outages and a lot of alerts, and you’re trying to run your operations in a more manual way without clear goals defined.

And I talk to companies and organizations about taking a step back and saying, “Okay, let’s get started. You don’t need to make perfect SLOs. You don’t need to rearchitect everything. You need to get started with something and baseline it. And the SLOs come first in that digital journey.” It’s not, “We need to go and rearchitect everything, and then we’ll be able to start setting clear operational goals.” And I think that causality is backwards sometimes, if you know what I’m saying. It’s like if we want to get to higher liability and then we’ll have goals for our service, as opposed to saying, “No, set the goals first and incrementally improve them over time and focus on short-term impacts,” I would say is number one.

And this is what led to SLODLC. Check out SLOWDLC.com. We have a set of materials there, and there’s a business case template. There’s a slow discovery template. I use this probably, I would say, multiple times a week. I’m going and downloading the template for a customer or a prospect or a partner or a community member. And I’m just walking through, “Okay, who cares about your service? What happens when it doesn’t work? What are the dependencies? What are the expectations on this service, maybe that aren’t even written down, that customers have?” And you go through this sort of process with them, step by step, asking the questions. And you go, “Okay, you actually do have all the answers to these things. You just haven’t really thought about this in this sort of structured way.” And getting this input from companies like Ford and Etsy and Oracle that are contributing to SLODLC, it’s kind of battle tested already too. This is something that we’ve really used, and I’ve been able to use now with many companies as we were developing it. And now as it’s been published, people are starting to adopt and they’re really excited by it.

So I think this is, to me, the biggest blocker right now is just getting started. It’s sort of like, “Hey, I really want to get in shape and lose weight.” I’m mean, I’m speaking of myself here. So you get the gym membership. You get the equipment. And it’s like, “Well, what should I do?” Well, you should get started. You need to actually go do it. And that I think is the biggest challenge. I understand organizations have a lot going on, but to me, it’s sort of a counterintuitive effect. I actually am hopeful that as people become more focused on efficiency and sustainability and cost savings, that this will become an accelerator for SLO adoption. We’re already hearing that and seeing that. People are saying, “Okay, look, my observability stack is out of control. My cloud spend is crazy. My team is ragged, dealing with pages. We need to do something. We can’t just keep hoping that the environment will get better.” And the SLOs become a key part of the strategy, along with other things too, by the way, right? Improving their logging and architecture, improving their software development practices, adopting Agile and testing and things like that. It’s part of that whole mix of how these organizations become a lot better.

And we’re even seeing things like hotel chains that are saying, “Oh, we have to have an app for check-in because of COVID.” Well, in reality, it’s actually a better experience to have the app for check-in and to use it as your hotel room key. That’s actually a better experience anyway. So, to me, you can kind of solve two problems at once because you’re saying, “We have to become efficient. We have to become lean.” That actually leads to innovation. And that’s, I think, to me, the most exciting part of what people are now overcoming. They’re realizing this, and we’re seeing that with the SLO adoption.

Swapnil Bhartiya: Okay. Thank you so much for taking time out today and talk about not only the SLOconf, but also the whole evolution, how you folks are actually helping folks embrace. But I do see there are a lot of challenges. One is certainly around awareness and education and tell them that, “Yes, you are ready to embrace it.” So I would love to have you back on the show to discuss some of those topics as well. But I really appreciate your time today. Thank you.

Kit Merker: I’ll be back anytime you want, man. Good to see you. Good chatting with you.

[/expander_maker]

You may also like