PodcastsTechnologyThe DevSecOps Talks Podcast

The DevSecOps Talks Podcast

Mattias Hemmingsson, Julien Bisconti and Andrey Devyatkin
The DevSecOps Talks Podcast
Latest episode

99 episodes

  • The DevSecOps Talks Podcast

    #99 - AI SRE and the End of 3 AM On-Call

    28/04/2026 | 48 mins.
    Could AI handle the worst parts of incident response before you even join the call? Mattias and Paulina talk with Birol Yildiz about AI-written status updates, fast root cause analysis, and the path from read-only help to autonomous fixes. They also explore why post-mortems and documentation may be some of the best places to start. 

    We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.

    DevSecOps Talks podcast LinkedIn page

    DevSecOps Talks podcast website

    DevSecOps Talks podcast YouTube channel
  • The DevSecOps Talks Podcast

    #98 - Beyond AI SRE

    21/04/2026 | 37 mins.
    Andrey shares the thinking behind Boris and the idea of going beyond AI SRE. The conversation covers the DevOps talent shortage, the coming squeeze on AI costs, why repeatable operational tasks are a strong fit for agents, and why customer data should stay in the customer's own AWS account. 

    We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.

    DevSecOps Talks podcast LinkedIn page

    DevSecOps Talks podcast website

    DevSecOps Talks podcast YouTube channel
  • The DevSecOps Talks Podcast

    #97 - Shift Left, Get Hacked: Supply Chain Attacks Hit Devs

    15/04/2026 | 35 mins.
    March 2026 made supply chain attacks feel a lot less theoretical, but what made these incidents different? The hosts discuss compromised publishing credentials, automatic execution hooks like post-install scripts and Python `.pth` files, and how both humans and security tools caught the malicious releases. They also talk through concrete ways to make developer environments harder to abuse. 

    We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.

    DevSecOps Talks podcast LinkedIn page

    DevSecOps Talks podcast website

    DevSecOps Talks podcast YouTube channel
  • The DevSecOps Talks Podcast

    #96 - Keeping Platforms Simple and Fast with Joachim Hill-Grannec

    01/04/2026 | 48 mins.
    This episode with Joachim Hill-Grannec asks: How do platforms bloat, and how do you keep them simple and fast with trunk-based dev and small batches? Which metrics prove it works—cycle time, uptime, or developer experience? Can security act as a partner that speeds delivery instead of a gate?

     

    We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.

    DevSecOps Talks podcast LinkedIn page

    DevSecOps Talks podcast website

    DevSecOps Talks podcast YouTube channel

    Summary
    In this episode of DevSecOps Talks, Mattias speaks with Joachim Hill-Grannec, co-founder of Peltek, a boutique consulting firm specializing in high-availability, cloud-native infrastructure. Following up on a previous episode where Steve discussed cleaning up bloated platforms, Mattias and Joachim dig into why platforms get bloated in the first place and how platform teams should think when building from scratch. Their conversation spans cloud provider preferences, the primacy of cycle time, the danger of adding process in response to failure, and a strong argument for treating security and quality as enablers rather than gatekeepers.

    Key Topics
    Platform Teams Should Serve Delivery Teams
    Joachim frames the core question of platform engineering around who the platform is actually for. His answer is clear: the delivery teams are the client. Platform engineers should focus on making it easier for developers to ship products, not on making their own work more convenient.

    He connects this directly to platform bloat. In his experience, many platforms grow uncontrollably because platform engineers keep adding tools that help the platform team itself: "Look, I spent this week to make my job this much faster." But Joachim pushes back on this instinct — the platform team is an amplifier for the organization, and every addition should be evaluated by whether it helps a product get to production faster and gives developers better visibility into what they are working on.

    Choosing a Cloud Provider: Preferences vs. Reality
    The conversation briefly explores cloud provider choices. Joachim says GCP is his personal favorite from a developer perspective because of cleaner APIs and faster response times, though he acknowledges Google's tendency to discontinue services unexpectedly. He describes AWS as the market workhorse — mature, solid, and widely adopted, comparing it to "the Java of the land." Azure gets the coldest reception; both acknowledge it has improved over time, but Joachim says he still struggles whenever he is forced to use it.

    They observe that cloud choices are frequently made outside engineering. Finance teams, investors, and existing enterprise agreements often drive the decision more than technical fit. Joachim notes a common pairing: organizations using Google Workspace for productivity but AWS for cloud infrastructure, partly because the Entra ID (formerly Azure AD) integration with AWS Identity Center works more smoothly via SCIM than the equivalent Google Workspace setup, which requires a Lambda function to sync groups.

    Measuring Platform Success: Cycle Time Above All
    When Mattias asks how a team can tell whether a platform is actually successful, Joachim separates subjective and objective measures.

    On the subjective side, he points to developer happiness and developer experience (DX). Feedback from delivery teams matters, even if surveys are imperfect.

    On the objective side, his favorite metric is cycle time — specifically, the time from when code is ready to when it reaches production. He also mentions uptime and availability, but keeps returning to cycle time as the clearest indicator that a platform is helping teams deliver faster. This aligns with DORA research, which has consistently shown that deployment frequency and lead time for changes are strong predictors of overall software delivery performance.

    Start With a Highway to Production
    A major theme of the episode is that platforms should begin with the shortest possible route to production. Mattias calls this a "highway to production," and Joachim strongly agrees.

    For greenfield projects, Joachim favors extremely fast delivery at first — commit goes to production, commit goes to production — even with minimal process. As usage and risk increase, teams can gradually add automation, testing, and safeguards. The critical thing is to keep the flow and then ask "how do we make those steps faster?" as you add them, rather than letting each new step slow down the pipeline unchallenged.

    He also makes a strong case for tags and promotions over branch-based deployment, noting his instinctive reaction when someone asks "which branch are we deploying from?" is: "No branches — tags and promotions."

    The Trap of Slowing Down After Failure
    Joachim warns about a common and dangerous pattern: when a bug reaches production, the natural organizational reaction is not to fix the pipeline, but to add gates. A QA team does a full pass, a security audit is inserted, a manual review step appears. Each gate slows delivery, which leads to larger batches, which increases risk, which triggers even more controls.

    He sees this as a vicious cycle. Organizations that respond to incidents by slowing delivery actually get worse security, worse quality, and worse throughput over time. He references a study — likely the research behind the book Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim — showing that faster delivery correlates with better security and quality outcomes. The organizations adding Engineering Review Boards (ERBs) and Architecture Review Boards (ARBs) in the name of safety often do not measure the actual impact, so they never see that the controls are making things worse.

    Mattias connects this to AI-assisted development, where developers can now produce changes faster than ever. If the pipeline cannot keep up, the pile of unreleased changes grows, making each release riskier.

    Getting Buy-In: Start With Small Experiments
    Joachim does not recommend that a slow, process-heavy organization throw everything out overnight. Instead, he suggests starting with small experiments. Code promotions are a good entry point: teams can start producing artifacts more rapidly without changing how those artifacts are deployed. Once that works, the conversation shifts to delivering those artifacts faster.

    He finds starting on the artifact pipeline side produces quicker wins and more organizational buy-in than starting with the platform deployment side, which tends to be more intertwined and higher-risk to change.

    Guiding Principles Over a Rigid Golden Path
    Mattias questions the idea of a single "golden path," saying the term implies one rigid way of working. Joachim leans toward guiding principles instead.

    His strongest principle is simplicity — specifically, simplicity to understand, not necessarily simplicity to create. He references Rich Hickey's influential talk Simple Made Easy (from Strange Loop 2011), which distinguishes between things that are simple (not intertwined) and things that are easy (familiar or close at hand). Creating simple systems is hard work, but the payoff is systems that are easy to reason about, easy to change, and easy to secure.

    His second guiding principle is replaceability. When evaluating any tool in the platform, he asks: "How hard would it be to yank this out and replace it?" If swapping a component would be extremely difficult, that is a smell — it means the system has become too intertwined. Even with a tool as established as Argo CD, his team thinks about what it would look like to switch it out.

    Tooling Choices and Platform Foundations
    Joachim outlines the patterns his team typically uses when building platforms, organized into two paths:

    Delivery pipeline (artifact creation): - Trunk-based development over GitFlow - Release tags and promotions rather than branch-based deployment - Containerization early in the pipeline - Release Please for automated release management and changelogs - Renovate for dependency updates (used for production environment promotions from Helm charts and container images)

    Platform side (environment management): - Kubernetes-heavy, typically EKS on AWS - Karpenter for node scaling - AWS Load Balancer Controller only as a backing service for a separate ingress controller (not using ALB Ingress directly, due to its rough edges) - Argo CD for GitOps synchronization and deployment - Argo Image Updater for lower environments to pull latest images automatically - Helm for packaging, despite its learning curve

    He notes that NGINX Ingress Controller has been deprecated, so teams need to evaluate alternatives for their ingress layer.

    Developers Should Not Be Fully Shielded From Operations
    One of the more nuanced parts of the conversation is how much operational responsibility developers should have. Joachim rejects both extremes. He does not think every developer needs to know everything about infrastructure, but he has seen too many cases where developers completely isolated from runtime concerns make poor decisions — missing simple code changes that would make a system dramatically easier to deploy and operate.

    He advocates for transparency and collaboration. Platform repos should be open for anyone on the dev team to submit pull requests. When the platform team makes a change, they should pull in developers to work alongside them. This way, the delivery team gradually builds a deeper understanding of how the whole system works.

    Joachim loves the open-source maintainer model applied inside organizations: platform teams are maintainers of their areas, but anyone in the organization should be able to introduce change. He warns against building custom CLIs or heavy abstractions that create dependencies — if a developer wants to do something the CLI does not support, the platform team becomes a bottleneck.

    Mattias adds that opening up the platform to contributions also exposes assumptions. What feels easy to the person who built it may not be easy at all; it is just familiar. Outside contributors reveal where the system is actually hard to understand.

    Designers, Not Artists: Detaching Ego From Code
    Joachim shares an analogy he prefers over the common "developers as artists" framing. He sees developers more like designers than artists, because an artist's work is tied to their identity — they want it to endure. A designer, by contrast, creates something to serve a purpose and expects it to be replaced when something better comes along.

    He applies this to platforms and infrastructure: "I want my thing to get wiped out. If I build something, I want it to get removed eventually and have something better replace it." Organizations where ego is tied to specific systems or tools tend to resist change, which leads to the kind of dysfunction that keeps platforms bloated and brittle.

    Complexity Is the Enemy of Security
    Mattias raises the difficulty of maintaining complex security setups over time, especially when the original experts leave. Joachim responds firmly: complexity is anti-security.

    If people cannot comprehend a system, they cannot secure it well. He acknowledges that some problems are genuinely hard, but argues that much of the complexity engineers create is unnecessary — driven by ego rather than need. "The really smart people are the ones that create simple things," he says, wishing the industry would redirect its narrative from admiring complicated systems to admiring simple ones.

    Security and QA as Internal Consulting, Not Gatekeeping
    Joachim draws a parallel between security and QA. He dislikes calling a team "the quality team," preferring "verification" — they are one component of quality, not the entirety of it. Similarly, security is not one team's responsibility; it spans product design, development practices, tooling, and operations.

    His ideal model is for security and QA teams to operate as internal consultants whose goal is to reduce risk and improve the overall system — not to catch every possible issue at any cost. The framing matters: if a security team's mandate is simply "block all security issues," the logical conclusion is to stop shipping or delete the product entirely. That may be technically secure, but it is useless.

    He frames security as risk management: "Security is a risk management process, not just security for the sake of security. You're managing the risk to the business." The goal should be to deliver faster and more securely — an "and," not an "or."

    Mattias recalls a PCI DSS consultant joking over drinks that a system being down is perfectly compliant — no one can steal card numbers if the system is unavailable. The joke lands because it exposes exactly the broken incentive Joachim describes.

    Business Value as the Unifying Frame
    The episode closes by tying everything back to business outcomes. Joachim argues that speed and security are not opposites; both contribute to business value. Fast delivery creates value directly, while security reduces business risk — and risk management is itself a business operation.

    He explains why focusing on the highest-impact business bottleneck first builds trust. When you hit the big items first, you earn credibility, and subsequent changes become easier to justify. For example, one of his clients has a security group that is the slowest part of their organization. Speeding up that security process would have a massive impact on business delivery — more than optimizing the artifact pipeline.

    Mattias reflects that he used to see platform work as separate from business concerns — "I don't care about the business, I'm here to build a platform for developers." Looking back, he would reframe that: using business impact as the measure of platform success does not mean abandoning the focus on developers, it means having a clearer way to prioritize and demonstrate value.

    Highlights

    Joachim on platform bloat: "Your job is not to make your job faster and easier — you're an amplifier to the organization."

    Joachim on his favorite metric: "Cycle time is my favorite metric. I love cycle time metrics."

    Joachim on deployment strategy: "No branches, no branches — tags and promotions."

    Mattias on platform design: He calls the ideal early setup a "highway to production."

    Joachim on simplicity vs. ease: He references Rich Hickey's Simple Made Easy talk — "It's very hard to create simple systems that are easy to reason about. And it's very easy to create systems that are very hard to reason about."

    Joachim on replaceability: "If swapping a tool out would be extremely hard, that's a pretty big smell."

    Joachim on complexity and security: "If it's complicated, you just can't keep all the context together. Simple systems are much easier to be secure."

    Joachim on engineering ego: "I don't particularly like the aspect of [developers as] artists... I want my thing to get wiped out. I want it to get removed eventually and have something better replace it." He prefers the analogy of designers over artists, because artists tie their identity to their creations.

    Joachim on security as a blocker: "If their goal is we are going to block every security issue, the best way to do that is delete your product."

    Spicy cloud takes: Joachim calls GCP his favorite cloud for developers, compares AWS to "the Java of the land," and says he still struggles every time he is forced to use Azure.

    PCI DSS dark humor: Mattias recalls a consultant joking that a downed system is perfectly compliant — you cannot steal card numbers from a system that is not running.

    Joachim on the slow-down trap: Organizations add ERBs, ARBs, and manual security gates after incidents, but "the faster you can deliver, you actually get better security, better quality, and better throughput — and the more you slow it down, you go the opposite."

    Resources

    Simple Made Easy by Rich Hickey (InfoQ) — The influential 2011 talk Joachim references on distinguishing simplicity from ease in system design.

    DORA Metrics: The Four Keys — The research framework behind cycle time, deployment frequency, and the finding that speed and stability are not tradeoffs.

    Trunk Based Development — A comprehensive guide to the branching strategy Joachim recommends over GitFlow.

    Argo CD — Declarative GitOps for Kubernetes — The GitOps tool Joachim's team uses for cluster synchronization and deployment.

    Release Please (GitHub) — Google's tool for automated release management based on conventional commits, used by Joachim's team for tag-based promotions.

    Karpenter — Kubernetes Node Autoscaler — The node autoscaler Joachim's team uses with EKS for fast, flexible scaling.

    Renovate — Automated Dependency Updates — The dependency management bot Joachim uses for both build dependencies and production environment promotions.
  • The DevSecOps Talks Podcast

    #95 - From Platform Theater to Golden Guardrails with Steve Wade

    23/03/2026 | 45 mins.
    Is your Kubernetes stack bloated, slow, and hard to explain? Steve Wade shares simple checks—the hiring treadmill, onboarding time, and the acronym test—to spot platform theater fast. What would a 30-day deletion sprint cut, save, and secure? 

    We are always happy to answer any questions, hear suggestions for new episodes, or hear from you, our listeners.

    DevSecOps Talks podcast LinkedIn page

    DevSecOps Talks podcast website

    DevSecOps Talks podcast YouTube channel

    Summary
    In this episode of DevSecOps Talks, Mattias and Paulina speak with Steve Wade, founder of Platform Fix, about why so many Kubernetes and platform initiatives become overcomplicated, expensive, and painful for developers. Steve has helped simplify over 50 cloud-native platforms and estimates he has removed around $100 million in complexity waste. The conversation covers how to spot a bloated platform, why "free" tools are never really free, how to systematically delete what you don't need, and why the best platform engineering is often about subtraction rather than addition.

    Key Topics
    Steve's Background: From Complexity Creator to Strategic Deleter
    Steve introduces himself as the founder of Platform Fix — the person companies call when their Kubernetes migration is 18 months in, millions over budget, and their best engineers are leaving. He has done this over 50 times, and he is candid about why it matters so much to him: he used to be this problem.

    Years ago, Steve led a migration that was supposed to take six months. Eighteen months later, the team had 70 microservices, three service meshes (they kept starting new ones without finishing the old), and monitoring tools that needed their own monitoring. Two senior engineers quit. The VP of Engineering gave Steve 90 days or the team would be replaced.

    Those 90 days changed everything. The team deleted roughly 50 of the 70 services, ripped out all the service meshes, and cut deployment time from three weeks of chaos to three days, consistently. Six months later, one of the engineers who had left came back. That experience became the foundation for Platform Fix.

    As Steve puts it: "While everyone's collecting cloud native tools like Pokemon cards, I'm trying to help teams figure out which ones to throw away and which ones to keep."

    Why Platform Complexity Happens
    Steve explains that organizations fall into a complexity trap by continuously adding tools without questioning whether they are actually needed. He describes walking into companies where the platform team spends 65–70% of their time explaining their own platform to the people using it. His verdict: "That's not a team, that's a help desk with infrastructure access."

    People inside the complexity normalize it. They cannot see the problem because they have been living in it for months or years. Steve identifies several drivers: conference-fueled recency bias (someone sees a shiny tool at KubeCon and adopts it without evaluating the need), resume-driven architecture (engineers choosing tools to pad their CVs), and a culture where everyone is trained to add but nobody asks "what if we remove something instead?"

    He illustrates the resume-driven pattern with a story from a 200-person fintech. A senior hire — "Mark" — proposed a full stack: Kubernetes, Istio, Argo, Crossplane, Backstage, Vault, Prometheus, Loki, Tempo, and more. The CTO approved it because "Spotify uses it, so it must be best practice." Eighteen months and $2.3 million later, six engineers were needed just to keep it running, developers waited weeks to deploy, and Mark left — with "led Kubernetes migration" on his CV. When Steve asked what Istio was actually solving, nobody could answer. It was costing around $250,000 to run, for a problem that could have been fixed with network policies.

    He also highlights a telling sign: he asked three people in the same company how many Kubernetes clusters they needed and got three completely different answers. "That's not a technical disagreement. That's a sign that nobody's aligned on what the platform is actually for."

    The AI Layer: Tool Fatigue Gets Worse
    Paulina observes that the same tool-sprawl pattern is now being repeated with AI tooling — an additional layer of fatigue on top of what already exists in the cloud-native space. Steve agrees and adds three dimensions to the AI complexity problem: choosing which LLM to use, learning how to write effective prompts, and figuring out who is accountable when AI-written code does not work as expected. Mattias notes that AI also enables anyone to build custom tools for their specific needs, which further expands the toolbox and potential for sprawl.

    How Leaders Can Spot a Bloated Platform
    One of the most practical segments is Steve's framework for helping leaders who are not hands-on with engineering identify platform bloat. He gives them three things to watch for:

    The hiring treadmill: headcount keeps growing but shipping speed stays flat, because all new capacity is absorbed by maintenance.

    The onboarding test: ask the newest developer how long it took from their first day to their first production deployment. If it is more than a week, "it's a swamp." Steve's benchmark: can a developer who has been there two weeks deploy without asking anyone? If yes, you have a platform. If no, "you have platform theater."

    The acronym test: ask the platform team to explain any tool of their choosing without using a single acronym. If they cannot, it is likely resume-driven architecture rather than genuine problem-solving.

    The Sagrada Familia Problem
    Steve uses a memorable analogy: many platforms are like the Sagrada Familia in Barcelona — they look incredibly impressive and intricate, but they are never actually finished. The question leaders should ask is: what does an MVP platform look like, what tools does it need, and how do we start delivering business value to the developers who use it? Because, as Steve says, "if we're not building any business value, we're just messing around."

    Who the Platform Is Really For
    Mattias asks the fundamental question: who is the platform actually for? Steve's answer is direct — the platform's customers are the developers deploying workloads to it. A platform without applications running on it is useless.

    He distinguishes three stages:
    - Vanilla Kubernetes: the out-of-the-box cluster
    - Platform Kubernetes: the foundational workloads the platform needs to function (secret management, observability, perhaps a service mesh)
    - The actual platform: only real once applications are being deployed and business value is delivered

    The hosts discuss how some teams build platforms for themselves rather than for application developers or the business, which is a fast track to unnecessary complexity.

    Kubernetes: Standard Tool or Premature Choice?
    The episode explores when Kubernetes is the right answer and when it is overkill. Steve emphasizes that he loves Kubernetes — he has contributed to the Flux project and other CNCF projects — but only when it is earned. He gives an example of a startup with three microservices, ten users, and five engineers that chose Kubernetes because "Google uses it" and the CTO went to KubeCon. Six months later, they had infrastructure that could handle ten million users while serving about 97.

    "Google needs Kubernetes, but your Series B startup needs to ship features."

    Steve also shares a recent on-site engagement where he ran the unit economics on day two: the proposed architecture needed four times the CPU and double the RAM for identical features. One spreadsheet saved the company from a migration that would have destroyed the business model. "That's the question nobody asks before a Kubernetes migration — does the maths actually work?"

    Mattias pushes back slightly, noting that a small Kubernetes cluster can still provide real benefits if the team already has the knowledge and tooling. Paulina adds an important caveat: even if a consultant can deploy and maintain Kubernetes, the question is whether the customer's own team can realistically support it afterward. The entry skill set for Kubernetes is significantly higher than, say, managed Docker or ECS.

    Managed Services and "Boring Is Beautiful"
    Steve's recommendation for many teams is straightforward: managed platforms, managed databases, CI/CD that just works, deploy on push, and go home at 5 p.m. "Boring is beautiful, especially when you call me at 3 a.m."

    He illustrates this with a company that spent 18 months and roughly $850,000 in engineering time building a custom deployment system using well-known CNCF tools. The result was about 80–90% as good as GitHub Actions. The migration to GitHub Actions cost around $30,000, and the ongoing maintenance cost was zero.

    Paulina adds that managed services are not completely zero maintenance either, but the operational burden is orders of magnitude less than self-managed infrastructure, and the cloud provider takes on a share of the responsibility.

    The New Tool Tax: Why "Free" Tools Are Never Free
    A central theme is that open-source tools carry hidden costs far exceeding their license fee. Steve introduces the new tool tax framework with four components, using Vault (at a $40,000 license) as an example:

    Learning tax (~$45,000): three engineers, two weeks each for training, documentation, and mistakes

    Integration tax (~$20,000): CI/CD pipelines, Kubernetes operators, secret migration, monitoring of Vault itself

    Operational tax (~$50,000/year): on-call, upgrades, tickets, patching

    Opportunity tax (~$80,000): while engineers work on Vault, they are not building things that could save hundreds of hours per month

    Total year-one cost: roughly $243,000 — a 6x multiplier over the $40,000 budget. And as Steve points out, most teams never present this full picture to leadership.

    Mattias extends the point to tool documentation complexity, noting that anyone who has worked with Envoy's configuration knows how complicated it can be. Steve adds that Envoy is written in C — "How many C developers do you have in your organization? Probably zero." — yet teams adopt it because it offers 15 to 20 features that may or may not be useful.

    This is the same total cost of ownership concept the industry has used for on-premises hardware, but applied to the seemingly "free" cloud-native landscape. The tools are free to install, but they are not free to manage and maintain.

    Why Service Meshes Are Often the First to Go
    When Mattias asks which tool type Steve most often deletes, the answer is service meshes. Steve does not name a specific product but says six or seven times out of ten, service meshes exist because someone thought they were cool, not because the team genuinely needed mutual TLS, rate limiting, or canary deploys at the mesh level.

    Mattias agrees: in his experience, he has never seen an environment that truly required a service mesh. The demos at KubeCon are always compelling, but the implementation reality is different. Steve adds a self-deprecating note — this was him in the past, running three service meshes simultaneously because none of them worked perfectly and he kept starting new ones in test mode.

    A Framework for Deleting Tools
    Steve outlines three frameworks he uses to systematically simplify platforms.

    The Simplicity Test is a diagnostic that scores platform complexity across ten dimensions on a scale of 0 to 50: tool sprawl, deployment complexity, cognitive load, operational burden, documentation debt, knowledge silos, incident frequency, time to production, self-service capability, and team satisfaction. A score of 0–15 is sustainable, 16–25 is manageable, 26–35 is a warning, and 36–50 is crisis. Over 400 engineers have taken it; the average score is around 34. Companies that call Steve typically score 38 to 45.

    The Four Buckets categorize every tool: Essential (keep it), Redundant (duplicates something else — delete immediately), Over-engineered (solves a real problem but is too complicated — simplify it), or Premature (future-scale you don't have yet — delete for now).

    From one engagement with 47 tools: 12 were essential, 19 redundant, 11 over-engineered, and 5 premature — meaning 35 were deletable.

    He then prioritizes by impact versus risk, tackling high-impact, low-risk items first. For example, a large customer had Datadog, Prometheus, and New Relic running simultaneously with no clear rationale. Deleting New Relic took three hours, saved $30,000, and nobody noticed. Seventeen abandoned databases with zero connections in 30 days were deprecated by email, then deleted — zero responses, zero impact.

    The security angle matters here too: one of those abandoned databases was an unpatched attack surface sitting in production with no one monitoring it. Paulina adds a related example — her team once found a Flyway instance that had gone unpatched for seven or eight years because each team assumed the other was maintaining it. As she puts it, lack of ownership creates the same kind of hidden risk.

    The 30-Day Cleanup Sprint
    Steve structures platform simplification as a focused 30-day effort:

    Week 1: Audit. Discover what is actually running — not what the team thinks is running, because those are usually different.

    Week 2: Categorize. Apply the four buckets and prioritize quick wins. During this phase, Steve tells the team: "For 30 days, you're not building anything — you're only deleting things."

    Week 3: Delete. Remove redundant tools and simplify over-engineered ones.

    Week 4: Systemize. Document everything — how decisions were made, why tools were removed, and how new tool decisions should be evaluated going forward. Build governance dashboards to prevent complexity from creeping back.

    He illustrates this with a company whose VP of Engineering — "Sarah" — told him: "This isn't a technical problem anymore. This is a people problem." Two senior engineers had quit on the same day with the same exit interview: "I'm tired of fighting the platform." One said he had not had dinner with his kids on a weekend in six months. The team's morale score was 3.2 out of 10.

    The critical insight: the team already knew what was wrong. They had known for months. But nobody had been given permission to delete anything. "That's not a cultural problem and it's not a knowledge problem. It's a permissions problem. And I gave them the permission."

    Results: complexity score dropped from 42 to 26, monthly costs fell from $150,000 to $80,000 (roughly $840,000 in annual savings), and deployment time improved from two weeks to one day.

    But Steve emphasizes the human outcome. A developer told him afterward: "Steve, I went home at 5 p.m. yesterday. It's the first time in eight months. And my daughter said, 'Daddy, you're home.'" That, Steve says, is what this work is really about.

    Golden Paths, Guardrails, and Developer Experience
    Mattias says he wants the platform he builds to compete with the easiest external options — Vercel, Netlify, and the like. If developers would rather go elsewhere, the internal platform has failed.

    Steve agrees and describes a pattern he sees constantly: developers do not complain when the platform is painful — they route around it. He gives an example from a fintech where a lead developer ("James") needed a test environment for a Friday customer demo. The official process required a JIRA ticket, a two-day wait, YAML files, and a pipeline. Instead, James spun up a Render instance on his personal credit card: 12 minutes, deployed, did the demo, got the deal. Nobody knew for three months, until finance found the charges.

    Steve's view: that is not shadow IT or irresponsibility — it is a rational response to poor platform usability. "The fastest path to business value went around the platform, not through it."

    The solution is what Steve calls the golden path — or, as he reframes it using a bowling alley analogy, golden guardrails. Like the bumpers that keep the ball heading toward the pins regardless of how it is thrown, the guardrails keep developers on a safe path without dictating exactly how they get there. The goal is hitting the pins — delivering business value.

    Mattias extends the guardrails concept to security: the easiest path should also be the most secure and compliant one. If security is harder than the workaround, the workaround wins every time. He aims to make the platform so seamless that developers do not have to think separately about security — it is built into the default experience.

    Measuring Outcomes, Not Features
    Steve argues that platform teams should measure developer outcomes, not platform features: time to first deploy, time to fix a broken deployment, overall developer satisfaction, and how secure and compliant the default deployment paths are.

    He recommends monthly platform retrospectives where developers can openly share feedback. In these sessions, Steve goes around the room and insists that each person share their own experience rather than echoing the previous speaker. This builds a backlog of improvements directly tied to real developer pain.

    Paulina agrees that feedback is essential but notes a practical challenge: in many organizations, only a handful of more active developers provide feedback, while the majority say they do not have time and just want to write code. Collecting representative feedback requires deliberate effort.

    She also raises the business and management perspective. In her consulting experience, she has seen assessments include a third dimension beyond the platform team and developers: business leadership, who focus on compliance, security, and cost. Sometimes the platform enables fast development, but management processes still block frequent deployment to production — a mindset gap, not a technical one. Steve agrees and points to value stream mapping as a technique for surfacing these bottlenecks with data.

    Translating Engineering Work Into Business Value
    Steve makes a forceful case that engineering leaders must express technical work in business terms. "The uncomfortable truth is that engineering is a cost center. We exist to support profit centers. The moment we forget that, we optimize for architectural elegance instead of business outcomes — and we lose the room."

    He illustrates this with a story: a CFO asked seven engineering leaders one question — "How long to rebuild production if we lost everything tomorrow?" Five seconds of silence. Ninety-four years of combined experience, and nobody could answer. "That's where engineering careers die."

    The translation matters at every level. Saying "we deleted a Jenkins server" means nothing to a CFO. Saying "we removed $40,000 in annual costs and cut deployment failures by 60%" gets attention.

    Steve challenges listeners to take their last three technical achievements and rewrite each one with a currency figure, a percentage, and a timeframe. "If you can't, you're speaking engineering, not business."

    Closing Advice: Start Deleting This Week
    Steve's parting advice is concrete: pick one tool you suspect nobody is using, check the logs, and if nothing has happened in 30 days, deprecate it. In 60 days, delete it. He also offers the simplicity test for free — it takes eight minutes, produces a 0-to-50 score with specific recommendations, and is available by reaching out to him directly.

    "Your platform's biggest risk isn't technical — it's political. Platforms die when the CFO asks you a question you can't answer, when your best engineer leaves, or when the team builds for their CV instead of the business."

    Highlights

    Steve Wade: "While everyone's collecting cloud native tools like Pokemon cards, I'm trying to help teams figure out which ones to throw away and which ones to keep."

    Steve Wade: "That's not a team, that's a help desk with infrastructure access." — on platform teams spending most of their time explaining their own platform

    Steve Wade: "If the answer is yes, it means you have a platform. And if it's no, it means you have platform theater."

    Steve Wade: "So many platforms are like the Sagrada Familia — they look super impressive, but they're never finished yet."

    Steve Wade: "Boring is beautiful, especially when you call me at 3 a.m."

    Steve Wade: "Google needs Kubernetes, but your Series B startup needs to ship features."

    Steve Wade: "One spreadsheet saved them from a migration that would have just simply destroyed the business model."

    Steve Wade: "They knew what was wrong. They'd known for months. But nobody was given the permission to delete anything."

    Steve Wade: "They're not really free. They're just free to install." — on open-source CNCF tools

    Steve Wade: "Your platform's power doesn't come from what you add. It comes from what you have the courage to delete."

    Mattias: Argues that an internal platform should compete with the easiest external alternatives — if developers would rather use Vercel, the platform has failed. Also extends the guardrails concept to security: the easiest path should always be the most secure path.

    Paulina: Highlights the ownership gap — tools can go unpatched for years when each team assumes another team maintains them. Also raises the management dimension: sometimes it is not the platform that is slow, but organizational processes that block deployment.

    Resources

    The Pragmatic CNCF Manifesto — Steve Wade's guide to cloud-native sanity, with six principles, pragmatic stack recommendations, and anti-patterns to avoid, drawn from 50+ enterprise migrations.

    The Deletion Digest — Steve's weekly newsletter for platform leaders, delivering one actionable lesson on deleting platform complexity every Saturday morning.

    CNCF Cloud Native Landscape — the full interactive map of cloud-native tools that Steve references when talking about needing to zoom out on your browser just to see everything.

    How Unnecessary Complexity Gave the Service Mesh a Bad Name (InfoQ) — a detailed analysis of why service meshes became the poster child for over-engineering in cloud-native environments.

    What Are Golden Paths? A Guide to Streamlining Developer Workflows — a practical guide to designing the "path of least resistance" that makes the right way the easy way.

    Value Stream Mapping for Software Delivery (DORA) — the DORA team's guide to the value stream mapping technique Steve mentions for surfacing bottlenecks between engineering and business.

    Inside Platform Engineering with Steve Wade (Octopus Deploy) — a longer conversation with Steve about platform engineering philosophy and practical approaches.

    Steve Wade on LinkedIn — where Steve regularly posts about platform simplification and can be reached directly.

More Technology podcasts

About The DevSecOps Talks Podcast

This is the show by and for DevSecOps practitioners who are trying to survive information overload, get through marketing nonsense, do the right technology bets, help their organizations to deliver value, and last but not the least to have some fun. Tune in for talks about technology, ways of working, and news from DevSecOps. This show is not sponsored by any technology vendor and trying to be as unbiased as possible. We talk like no one is listening! For good or bad :) For more info, show notes, and discussion of past and upcoming episodes visit devsecops.fm
Podcast website

Listen to The DevSecOps Talks Podcast, Everything Is Fake and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v8.8.14| © 2007-2026 radio.de GmbH
Generated: 5/5/2026 - 10:50:00 AM