PodcastsTechnologyGoogle SRE Prodcast

Google SRE Prodcast

Salim Virji
Google SRE Prodcast
Latest episode

44 episodes

  • Google SRE Prodcast

    The One With Steph Hippo and Observability

    16/12/2025 | 33 mins.

    In this episode, Steph Hippo, Platform Engineering Director at Honeycomb, joins The Prodcast to discuss AI and SRE.  Steph explains how observability helps us understand complex systems from their outputs, and provides a foundation for SRE to respond to system problems. This episode explains how AI and observability build a self-reinforcing loop.  We also discuss how AI can detect and respond to certain classes of incidents, leading to self-healing systems and allowing SREs to focus on novel and interesting problems. She advises small businesses adopting AI to learn from others' mistakes (post-mortems) and to commit time and budget to experimentation.  

  • Google SRE Prodcast

    The One with Ben Good and Our Kubernetes Friends

    30/7/2025 | 32 mins.

    In this special episode hosts Steve McGhee from the Google SRE Prodcast and Kaslin Fields from the Google Kubernetes Podcast, welcome Google Cloud Solutions Architect Ben Good to discuss platform engineering. Listeners can look forward to hearing about the role of Kubernetes as a tool for building platforms, how to create "golden paths" for developers, and the importance of observability and self-service in platform design. The conversation also touches on industry trends, the bespoke nature of platforms, and how DORA metrics can be applied to platform engineering practices.

  • Google SRE Prodcast

    The One With AI Agents, Ramón Llamas, and Swapnil Haria

    23/7/2025 | 42 mins.

    Google Staff SRE Ramón Llamas and Google Software Engineer Swapnil Haria join our hosts to explore how AI agents are revolutionizing production management, from summarizing alerts and finding hidden errors to proactively preventing outages. Learn about the challenges of evaluating non-deterministic systems and the fascinating interplay between human expertise and emerging AI capabilities in ensuring robust and reliable infrastructure.

  • Google SRE Prodcast

    The One with Technical Program Managers and Karanveer Anand

    16/7/2025 | 27 mins.

    This episode features Google Technical Program Manager (TPM) Karanveer Anand, who joins our hosts to discuss the unique role of TPMs in Site Reliability Engineering (SRE). The conversation highlights how SRE TPMs bridge the gap between technical details and business impact, managing complex projects with inter-team dependencies and ensuring system reliability, particularly in the rapidly evolving AI landscape.

  • Google SRE Prodcast

    The One with STPA, Jeffrey Snover, and Theo Klein

    02/7/2025 | 37 mins.

    This episode discusses Systems Theoretic Process Analysis (STPA), a method for analyzing complex systems. Theo Klein, a Google SRE, and Jeffrey Snover, a Distinguished Engineer at Google, explain that STPA focuses on identifying how system accidents and losses occur due to a loss of control, rather than component failures. STPA helps identify design flaws early, even before code is written! The discussion highlights that STPA is a human-driven process, prompting critical questions about system goals and potential losses, and that Google is adapting the pure STPA approach for commercial software development to make it more practical and efficient.

More Technology podcasts

About Google SRE Prodcast

SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
Podcast website

Listen to Google SRE Prodcast, Cheeky Pint and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v8.2.0 | © 2007-2025 radio.de GmbH
Generated: 12/17/2025 - 4:29:43 PM