87% of Ansible Playbooks Are Broken (AI Just Proved It)
06/03/2026 | 14 mins.
**87% of production Ansible playbooks have critical flaws - but AI just revealed how to fix them.**
Today's Platform Engineering Playbook dives deep into how AI is revolutionizing infrastructure automation and Ansible development. We'll explore groundbreaking research showing most production playbooks lack proper error handling, and how collaborative AI approaches are changing the game for platform engineers.
**What You'll Learn:** • Why most Ansible deployments are more fragile than you think • How to leverage AI to identify and fix critical infrastructure code issues • Real-world case studies of AI-assisted Ansible improvement • Latest developments in route optimization algorithms (RADAR) • Pulumi's massive 20x performance improvements now in GA • AWS Lambda's new Kiro power for durable functions
**Timestamps:** 0:00 Cold Open - The Ansible Crisis 2:15 Today's Platform Engineering News 8:30 Deep Dive: AI + Ansible Collaboration
Whether you're managing infrastructure at scale or just starting your platform engineering journey, this episode delivers actionable insights you can implement immediately. Learn how top engineering teams are using AI not to replace their expertise, but to amplify it.
**Sources & References:** • How to collaborate with AI to improve your Ansible skills: https://developers.redhat.com/articles/2026/03/04/how-collaborate-ai-improve-your-ansible-skills • RADAR: Learning to Route with Asymmetry-aware DistAnce Representations: https://arxiv.org/abs/2603.03388 • Now GA: Up to 20x Faster Pulumi Operations for Everyone: https://www.pulumi.com/blog/journaling-ga/ • Accelerate Lambda durable functions development with new Kiro power: https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-durable-kiro-power/ • How we would have managed a recent incident at Port with an incident agent: https://www.port.io/blog/how-we-would-have-managed-a-recent-incident-at-port-with-an-incident-agent • Scaling AI opportunity across the globe: Learnings from GitHub and Andela: https://github.blog/developer-skills/career-growth/scaling-ai-opportunity-across-the-globe-learnings-from-github-and-andela/
GrafanaCON 2026: The Agenda That Signals the Future of Observability
05/03/2026 | 18 mins.
**GrafanaCON 2026 just dropped their agenda, and every attendee will build an AI agent from scratch on day one. What does this tell us about the future of platform engineering?**
In today's Platform Engineering Playbook, we dissect the GrafanaCON 2026 agenda to uncover what it reveals about emerging trends in observability and platform tooling. We analyze why hands-on AI workshops are becoming conference staples and what this means for platform teams in 2026.
**What You'll Learn:** • How GrafanaCON's AI-first approach signals industry shifts • Strategic insights for platform teams from the conference agenda • Hidden cloud costs exposed by AWS's Well-Architected Framework • Release platform migration strategies that actually work • Why traditional ITOps fails with AI incident management
**Timestamps:** 00:00 Cold Open - GrafanaCON's AI Agent Challenge 02:15 Today's Platform Engineering News 08:30 Deep Dive: GrafanaCON 2026 Agenda Analysis
Whether you're planning conference attendance or building your 2026 platform strategy, this episode breaks down the signals that matter for platform engineering leaders.
What if your observability stack could debug and fix production issues while you sleep? That future might be closer than you think.
In today's Platform Engineering Playbook, we explore the cutting edge of agentic AI in observability systems and break down the biggest platform engineering news shaping March 2026.
**🎯 WHAT YOU'LL LEARN:** • How self-healing observability stacks are revolutionizing platform operations • Whether AI agents can truly handle your system's edge cases • Practical evaluation criteria for agentic observability tools • Critical security updates from Datadog's OCI protection expansion • Confluent's game-changing Kafka platform updates with A2A support
**⏰ TIMESTAMPS:** 0:00 Cold Open - The Future of Self-Debugging Systems 1:30 Today's Platform Engineering Headlines 8:45 Deep Dive: Agentic Observability - The Setup 15:20 Can AI Handle Your Edge Cases? - The Analysis
**💡 WHY LISTEN:** Get actionable insights on emerging platform technologies, real-world implementation strategies, and stay ahead of industry trends that will impact your infrastructure decisions.
Perfect for platform engineers, SREs, and DevOps professionals navigating the evolving landscape of autonomous systems.
What happens when a major AI platform goes dark while secretly pursuing billion-dollar government contracts? Claude's massive outage reveals critical lessons about platform engineering resilience that every infrastructure team needs to understand.
In today's Platform Engineering Playbook, we dissect Anthropic's Claude outage and uncover the hidden platform engineering challenges of serving classified government workloads. You'll discover why traditional cloud architectures fail when security requirements demand air-gapped infrastructure, and learn a practical framework for building "architectural resilience" into your own platforms.
**What You'll Learn:** • How to architect platforms for multiple security classifications • The real cost of government compliance on platform design • Pulumi's game-changing self-hosted Insights for infrastructure visibility • AWS Lambda runtime automation strategies that actually work • Why Cloudflare's markdown support signals a major shift in web architecture
**Timestamps:** 0:00 - Cold Open: Claude's Billion-Dollar Secret 2:15 - Today's Platform Engineering News 8:30 - Deep Dive: The Hidden Cost of Classified Computing 15:45 - Framework: Building Architectural Resilience
Whether you're scaling startup infrastructure or designing enterprise platforms, this episode delivers actionable insights you can implement immediately.
Backstage Is Becoming the Control Plane for Engineering
02/03/2026 | 18 mins.
**What if Spotify's secret weapon for managing 2,800 microservices could transform your entire platform engineering strategy?**
Today's Platform Engineering Playbook dives deep into the Backstage revolution that's quietly reshaping how engineering teams operate at scale. We break down what a production-grade Backstage implementation actually looks like in 2026, complete with real-world examples and concrete takeaways for your team.
**What You'll Learn:** • How Spotify's internal developer portal handles massive microservice complexity • Production-grade Backstage implementation strategies and best practices • Critical MySQL 9.6 changes affecting foreign key constraints and cascade handling • Bootc and OSTree's role in modernizing Linux system deployment • The latest developments in AI company military partnerships
Whether you're considering Backstage adoption or optimizing your current platform engineering stack, this episode delivers the tactical insights you need to level up your developer experience.
**Sources & References:** • KubeCon + CloudNativeCon Europe 2026 BackstageCon: https://www.cncf.io/blog/2026/02/27/kubecon-cloudnativecon-europe-2026-co-located-event-deep-dive-backstagecon/ • MySQL 9.6 Foreign Key Changes: https://www.infoq.com/news/2026/02/mysql-foreign-keys/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global • Bootc and OSTree Guide: https://a-cup-of.coffee/blog/ostree-bootc/ • AI Military Partnerships Update: https://www.businessinsider.com/anthropic-deal-pentagon-openai-sam-altman-dario-amodei-pete-hegseth-2026-2
The Platform Engineering Playbook Podcast is where AI meets open-source infrastructure knowledge—and you're part of the editorial process. Every episode is researched, scripted, and produced with AI, then reviewed by the community and published on GitHub for anyone to improve. Facing tool sprawl across 130+ platforms? Justifying PaaS costs to your CFO? Navigating the Shadow AI crisis hitting 85% of organizations? We tackle the messy realities of platform engineering that most content avoids, delivering data-backed insights and decision frameworks you can use Monday morning. Built for senior engineers, SREs, and DevOps practitioners with 5+ years in production, we dissect cloud economics, AI governance, infrastructure trade-offs, and career strategy—with the receipts to back it up. Think we got something wrong? Have better data? Open a pull request at platformengineeringplaybook.com. This is infrastructure podcasting as a living document, where the community keeps us honest and the content gets better with every contribution.
Read the playbook at https://platformengineeringplaybook.com
Listen to Platform Engineering Playbook Podcast, All-In with Chamath, Jason, Sacks & Friedberg and many other podcasts from around the world with the radio.net app