Author: Mark Lavi (LinkedIn)
Bio: As a Cloud Native Product Manager at Kasten by Veeam, Mark drives Kubernetes data protection and management solutions, engineering, and go to market motions together for upstream and downstream open source community projects with Veeam products and partners.


Kubernetes can be a challenge to learn, manage, and operate. Tools that ease the cloud native operational burdens of running Kubernetes clusters and workloads are critical for automation and governance in every organization. For example, Helm is widely used for deploying software in Kubernetes with 70% adoption, according to the CNCF.io 2019 Survey. The CNCF.io 2021 survey showed stateful workloads grew past 40% of deployed applications, highlighting the cloud native expansion to data management. A complimentary tool for application-level data management in Kubernetes is Kanister, an open-source project that uses blueprints to standardize and consistently manage data operations across Kubernetes workloads.

Data Management Blueprint Background

Blueprints are templates for performing data management actions on existing resources, typically a database in a Kubernetes cluster. For example, a Kubernetes cluster may have a MongoDB database deployed using a StatefulSet. While the StatefulSet specification describes characteristics of the deployment, such as the number of replicas to have available, it doesn’t capture the “day two” or maintenance IT operations, developers, and database administrator teams perform on the life cycle of the database and coordinate with the applications that depend on it. A blueprint specifies actions that are applied to a resource, such as a MongoDB, Oracle, MS-SQL database. Actions implement operations such as creating a backup or restoring data from a backup.

Of course, cluster and database administrators can manually run backups or create shell scripts scheduled to perform backups. Manually executing data protection operations like backups can be error prone. Although scripts can help reduce the chance of errors, it’s challenging to write scripts that correctly specify what steps to execute in the wide array of conditions that may arise. Declarative specifications define what should happen and the underlying system, such as Kubernetes, uses controllers to determine how to implement the desired state. Therefore, actions can automate manual operations, runbook procedures captured on paper or Wiki, and other bespoke procedures for almost all situations where monitoring alerts to a problem impeding business service level agreements.

How Blueprints Provide Value

Blueprints allow system administrators to define cluster and application specifications in code. This extends the benefits of Infrastructure as Code to the application level. Many of the benefits administrators have come to expect from using code to manage infrastructure now extend to managing applications and data operations on those applications. With the extension comes a reduced risk of human error as well as improving the scalability of DevOps staff who can now more effectively and efficiently manage large, complicated Kubernetes environments.

Blueprints help system administrators overcome several challenges. First, they provide for more consistent deployments across environments, such as staging and production. This is especially important when agile software development methods are employed. In those environments, small and frequent changes to code are the norm. Blueprint Actions can be delegated to developers and testers for self-service.

To maintain quality control over software, teams often use multiple environments to develop and then test software before putting it into production. It’s important for testing and staging environments to be consistent with production environments. This is challenging because a Kubernetes environment can have many services running, each using a variety of libraries, applications, databases, and operating systems. Something as simple as running a different minor version of a database in staging and production environments can result in code operating correctly in staging and then failing to operate in production. Consistency across environments is essential for ensuring reliable system operations. Blueprint Actions can be made part of any continuous integration and continuous delivery or deployment pipeline, becoming an essential element to advance potential GitOps efficiencies of a completely declarative, automated full-stack application orchestration and operation.

Implementing Blueprints: What’s Involved

To get started using blueprints, we’ll work with the open-source framework Kanister available at https://Kanister.io. Kanister is an extensible framework for application-level data management on Kubernetes to help you execute your data operations and capture them in a blueprint, so that your automation can be reused and parameterized.

Kanister was designed with a few key goals in mind, including support for data management at the application level, APIs to support integration of Kanister with other tools and services, and extensibility to adapt to a wide range of use cases. Kanister can be easily installed via Helm Chart to any Kubernetes cluster.

Kanister Architecture

Kanister uses several concepts that implement blueprint management, including:

  • Controller
  • Blueprints
  • ActionSets
  • Profiles

The Kanister Controller watches for new or updated Kanister blueprints and executes them: it implements the operator pattern and consumes custom resources in Kubernetes. Custom resources are written in YAML, an easy way to specify automation configuration without programming knowledge.

Blueprints are custom resources that wrap together various ActionSets, Profiles, and more to orchestrate data management operations the Controller will execute from the Kubernetes cluster.

ActionSets are custom resources that perform a sequence of actions, such as backing up or restoring a database. ActionSets specify each operational task and data needed, including details on artifacts, leveraging Kubernetes structures such as ConfigMaps and Secrets, and potentially spinning up containers to trigger custom tools on local or external resources.

Profiles contain source and destination storage artifact locations with related credentials.

Getting Started with Kanister

There are two easy, guided resources for you to get started. The public open source project documentation contains a Kanister tutorial for use on your Kubernetes cluster and the free, community driven KubeCampus.io Kanister Blueprint lab provides a complete Kubernetes learning environment prepared with an application, storage, and Kanister installed.

For additional example blueprints, see the Kanister Git repository, to leverage blueprints that have been designed by experts and show helpful design patterns for many popular databases. It is easy to adapt blueprints to your organization’s applications, environments, operations, and pipelines!

Kanister blueprints represent a cloud native, open source framework for orchestrating data management operations of persistent applications on Kubernetes and provide the essential automation for the lifecycle maintenance of workloads, their storage artifacts, and application orchestration.

You may also like