Rafay Systems, a Platform-as-a-Service (PaaS) provider specializing in modern infrastructure and accelerated computing, has unveiled its inaugural survey which reveals that a staggering 93% of platform teams are grappling with significant challenges. The most pressing issues include managing the complexities of Kubernetes, controlling Kubernetes and cloud-related costs, and enhancing developer productivity.
Titled “The Pulse of Enterprise Platform Teams: Cloud, Kubernetes, and AI,” the report delves into the hurdles faced by platform engineering teams within enterprise environments. In response to these challenges, many organizations are prioritizing environment standardization, cost control, and improved developer experiences. There’s a growing shift toward automation and self-service solutions, with a notable trend toward adopting AI tools. Interestingly, the majority of respondents believe that pre-configured AI workspaces, complete with integrated machine learning operations (MLOps) and large language model operations (LLMOps) tools, could potentially generate $1.4 million in productivity gains for a team of 100 developers.
The Cost and Complexity of Kubernetes Management
Despite the widespread adoption of platform teams, the survey highlights that these teams are often overwhelmed by the complexities of managing multi-cluster Kubernetes and cloud environments. Key challenges identified by respondents include:
- Cost management and visibility: 45% struggle with managing Kubernetes and cloud infrastructure costs.
- Cluster lifecycle management: 38% find it difficult to manage the Kubernetes cluster lifecycle using multiple disparate tools.
- Standardization efforts: 38% face challenges in establishing and maintaining enterprise-wide standardization.
As Kubernetes usage continues to expand, organizations are experiencing a significant increase in the costs and resources required to manage these environments. Nearly one-third (31%) report that the total cost of ownership for Kubernetes, including software licenses and personnel, exceeds their initial budgets. Looking ahead, 60% of respondents identify cost reduction and optimization as a top priority for managing Kubernetes infrastructure in the coming year.
AI and GenAI Adoption: A New Set of Challenges
The survey also highlights that organizations investing in AI and generative AI (GenAI) are encountering challenges similar to those faced during Kubernetes adoption. An overwhelming 96% emphasize the need for efficient development and deployment methods for AI applications, with 94% echoing the same for GenAI applications.
However, less than a quarter of organizations have fully implemented MLOps (17%) and LLMOps (16%). This early stage of adoption is reflected in the widespread difficulties faced by teams:
- MLOps challenges: 95% report difficulties in experimenting with and deploying AI applications.
- GenAI hurdles: 94% struggle with the experimentation and deployment of GenAI applications.
To overcome these obstacles, organizations are prioritizing key capabilities for their AI initiatives, such as pre-configured environments for developing and testing AI applications, automatic allocation of AI workloads to appropriate GPU resources, pre-built MLOps pipelines, GPU virtualization and sharing, and dynamic GPU matchmaking. These features aim to streamline development, optimize resource use, and manage costs effectively.
As with their roles in cloud and Kubernetes technologies, platform teams are expected to play a crucial role in overcoming these challenges and advancing AI and GenAI adoption. The top responsibilities identified for platform teams include:
- Security for MLOps and LLMOps workflows: 50%
- Model deployment automation: 49%
- Data pipeline management: 45%
The Growing Demand for Self-Service and Automation
The survey underscores a growing emphasis on enhancing the developer experience through automation and self-service, particularly in AI initiatives and Kubernetes deployments. Respondents identified several priorities to boost developer productivity within the Kubernetes ecosystem:
- Automating cluster provisioning: 47%
- Standardizing and automating infrastructure: 44%
- Providing self-service experiences for developers: 44%
- Automating Kubernetes cluster lifecycle management: 44%
- Reducing cognitive load on developer teams: 37%






