Powered by RND
PodcastsTechnologyLatent Space: The AI Engineer Podcast
Listen to Latent Space: The AI Engineer Podcast in the App
Listen to Latent Space: The AI Engineer Podcast in the App
(524)(250,057)
Save favourites
Alarm
Sleep timer

Latent Space: The AI Engineer Podcast

Podcast Latent Space: The AI Engineer Podcast
swyx + Alessio
The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover ...

Available Episodes

5 of 118
  • Open Operator, Serverless Browsers and the Future of Computer-Using Agents
    Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.* [00:00:00] Introductions* [00:04:46] AI-specific challenges in browser infrastructure* [00:07:05] Multimodality in AI-Powered Browsing* [00:12:26] Running headless browsers at scale* [00:18:46] Geolocation when proxying* [00:21:25] CAPTCHAs and Agent Auth* [00:28:21] Building “User take over” functionality* [00:33:43] Stagehand: AI web browsing framework* [00:38:58] OpenAI's Operator and computer use agents* [00:44:44] Surprising use cases of Browserbase* [00:47:18] Future of browser automation and market competition* [00:53:11] Being a solo founderTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, and you make browser companies that get acquired.Paul [00:03:23]: I truly do only have one trick. I'm screwed if it's not for headless browsers. I'm not a Go programmer. You know, I'm in AI grant. You know, browsers is an AI grant. But we were the only company in that AI grant batch that used zero dollars on AI spend. You know, we're purely an infrastructure company. So as much as people want to ask me about reinforcement learning, I might not be the best guy to talk about that. But if you want to ask about headless browser infrastructure at scale, I can talk your ear off. So that's really my area of expertise. And it's a pretty niche thing. Like, nobody has done what we're doing at scale before. So we're happy to be the experts.swyx [00:03:59]: You do have an AI thing, stagehand. We can talk about the sort of core of browser-based first, and then maybe stagehand. Yeah, stagehand is kind of the web browsing framework. Yeah.What is Browserbase? Headless Browser Infrastructure ExplainedAlessio [00:04:10]: Yeah. Yeah. And maybe how you got to browser-based and what problems you saw. So one of the first things I worked on as a software engineer was integration testing. Sauce Labs was kind of like the main thing at the time. And then we had Selenium, we had Playbrite, we had all these different browser things. But it's always been super hard to do. So obviously you've worked on this before. When you started browser-based, what were the challenges? What were the AI-specific challenges that you saw versus, there's kind of like all the usual running browser at scale in the cloud, which has been a problem for years. What are like the AI unique things that you saw that like traditional purchase just didn't cover? Yeah.AI-specific challenges in browser infrastructurePaul [00:04:46]: First and foremost, I think back to like the first thing I did as a developer, like as a kid when I was writing code, I wanted to write code that did stuff for me. You know, I wanted to write code to automate my life. And I do that probably by using curl or beautiful soup to fetch data from a web browser. And I think I still do that now that I'm in the cloud. And the other thing that I think is a huge challenge for me is that you can't just create a web site and parse that data. And we all know that now like, you know, taking HTML and plugging that into an LLM, you can extract insights, you can summarize. So it was very clear that now like dynamic web scraping became very possible with the rise of large language models or a lot easier. And that was like a clear reason why there's been more usage of headless browsers, which are necessary because a lot of modern websites don't expose all of their page content via a simple HTTP request. You know, they actually do require you to run this type of code for a specific time. JavaScript on the page to hydrate this. Airbnb is a great example. You go to airbnb.com. A lot of that content on the page isn't there until after they run the initial hydration. So you can't just scrape it with a curl. You need to have some JavaScript run. And a browser is that JavaScript engine that's going to actually run all those requests on the page. So web data retrieval was definitely one driver of starting BrowserBase and the rise of being able to summarize that within LLM. Also, I was familiar with if I wanted to automate a website, I could write one script and that would work for one website. It was very static and deterministic. But the web is non-deterministic. The web is always changing. And until we had LLMs, there was no way to write scripts that you could write once that would run on any website. That would change with the structure of the website. Click the login button. It could mean something different on many different websites. And LLMs allow us to generate code on the fly to actually control that. So I think that rise of writing the generic automation scripts that can work on many different websites, to me, made it clear that browsers are going to be a lot more useful because now you can automate a lot more things without writing. If you wanted to write a script to book a demo call on 100 websites, previously, you had to write 100 scripts. Now you write one script that uses LLMs to generate that script. That's why we built our web browsing framework, StageHand, which does a lot of that work for you. But those two things, web data collection and then enhanced automation of many different websites, it just felt like big drivers for more browser infrastructure that would be required to power these kinds of features.Alessio [00:07:05]: And was multimodality also a big thing?Paul [00:07:08]: Now you can use the LLMs to look, even though the text in the dome might not be as friendly. Maybe my hot take is I was always kind of like, I didn't think vision would be as big of a driver. For UI automation, I felt like, you know, HTML is structured text and large language models are good with structured text. But it's clear that these computer use models are often vision driven, and they've been really pushing things forward. So definitely being multimodal, like rendering the page is required to take a screenshot to give that to a computer use model to take actions on a website. And it's just another win for browser. But I'll be honest, that wasn't what I was thinking early on. I didn't even think that we'd get here so fast with multimodality. I think we're going to have to get back to multimodal and vision models.swyx [00:07:50]: This is one of those things where I forgot to mention in my intro that I'm an investor in Browserbase. And I remember that when you pitched to me, like a lot of the stuff that we have today, we like wasn't on the original conversation. But I did have my original thesis was something that we've talked about on the podcast before, which is take the GPT store, the custom GPT store, all the every single checkbox and plugin is effectively a startup. And this was the browser one. I think the main hesitation, I think I actually took a while to get back to you. The main hesitation was that there were others. Like you're not the first hit list browser startup. It's not even your first hit list browser startup. There's always a question of like, will you be the category winner in a place where there's a bunch of incumbents, to be honest, that are bigger than you? They're just not targeted at the AI space. They don't have the backing of Nat Friedman. And there's a bunch of like, you're here in Silicon Valley. They're not. I don't know.Paul [00:08:47]: I don't know if that's, that was it, but like, there was a, yeah, I mean, like, I think I tried all the other ones and I was like, really disappointed. Like my background is from working at great developer tools, companies, and nothing had like the Vercel like experience. Um, like our biggest competitor actually is partly owned by private equity and they just jacked up their prices quite a bit. And the dashboard hasn't changed in five years. And I actually used them at my last company and tried them and I was like, oh man, like there really just needs to be something that's like the experience of these great infrastructure companies, like Stripe, like clerk, like Vercel that I use in love, but oriented towards this kind of like more specific category, which is browser infrastructure, which is really technically complex. Like a lot of stuff can go wrong on the internet when you're running a browser. The internet is very vast. There's a lot of different configurations. Like there's still websites that only work with internet explorer out there. How do you handle that when you're running your own browser infrastructure? These are the problems that we have to think about and solve at BrowserBase. And it's, it's certainly a labor of love, but I built this for me, first and foremost, I know it's super cheesy and everyone says that for like their startups, but it really, truly was for me. If you look at like the talks I've done even before BrowserBase, and I'm just like really excited to try and build a category defining infrastructure company. And it's, it's rare to have a new category of infrastructure exists. We're here in the Chroma offices and like, you know, vector databases is a new category of infrastructure. Is it, is it, I mean, we can, we're in their office, so, you know, we can, we can debate that one later. That is one.Multimodality in AI-Powered Browsingswyx [00:10:16]: That's one of the industry debates.Paul [00:10:17]: I guess we go back to the LLMOS talk that Karpathy gave way long ago. And like the browser box was very clearly there and it seemed like the people who were building in this space also agreed that browsers are a core primitive of infrastructure for the LLMOS that's going to exist in the future. And nobody was building something there that I wanted to use. So I had to go build it myself.swyx [00:10:38]: Yeah. I mean, exactly that talk that, that honestly, that diagram, every box is a startup and there's the code box and then there's the. The browser box. I think at some point they will start clashing there. There's always the question of the, are you a point solution or are you the sort of all in one? And I think the point solutions tend to win quickly, but then the only ones have a very tight cohesive experience. Yeah. Let's talk about just the hard problems of browser base you have on your website, which is beautiful. Thank you. Was there an agency that you used for that? Yeah. Herb.paris.Paul [00:11:11]: They're amazing. Herb.paris. Yeah. It's H-E-R-V-E. I highly recommend for developers. Developer tools, founders to work with consumer agencies because they end up building beautiful things and the Parisians know how to build beautiful interfaces. So I got to give prep.swyx [00:11:24]: And chat apps, apparently are, they are very fast. Oh yeah. The Mistral chat. Yeah. Mistral. Yeah.Paul [00:11:31]: Late chat.swyx [00:11:31]: Late chat. And then your videos as well, it was professionally shot, right? The series A video. Yeah.Alessio [00:11:36]: Nico did the videos. He's amazing. Not the initial video that you shot at the new one. First one was Austin.Paul [00:11:41]: Another, another video pretty surprised. But yeah, I mean, like, I think when you think about how you talk about your company. You have to think about the way you present yourself. It's, you know, as a developer, you think you evaluate a company based on like the API reliability and the P 95, but a lot of developers say, is the website good? Is the message clear? Do I like trust this founder? I'm building my whole feature on. So I've tried to nail that as well as like the reliability of the infrastructure. You're right. It's very hard. And there's a lot of kind of foot guns that you run into when running headless browsers at scale. Right.Competing with Existing Headless Browser Solutionsswyx [00:12:10]: So let's pick one. You have eight features here. Seamless integration. Scalability. Fast or speed. Secure. Observable. Stealth. That's interesting. Extensible and developer first. What comes to your mind as like the top two, three hardest ones? Yeah.Running headless browsers at scalePaul [00:12:26]: I think just running headless browsers at scale is like the hardest one. And maybe can I nerd out for a second? Is that okay? I heard this is a technical audience, so I'll talk to the other nerds. Whoa. They were listening. Yeah. They're upset. They're ready. The AGI is angry. Okay. So. So how do you run a browser in the cloud? Let's start with that, right? So let's say you're using a popular browser automation framework like Puppeteer, Playwright, and Selenium. Maybe you've written a code, some code locally on your computer that opens up Google. It finds the search bar and then types in, you know, search for Latent Space and hits the search button. That script works great locally. You can see the little browser open up. You want to take that to production. You want to run the script in a cloud environment. So when your laptop is closed, your browser is doing something. The browser is doing something. Well, I, we use Amazon. You can see the little browser open up. You know, the first thing I'd reach for is probably like some sort of serverless infrastructure. I would probably try and deploy on a Lambda. But Chrome itself is too big to run on a Lambda. It's over 250 megabytes. So you can't easily start it on a Lambda. So you maybe have to use something like Lambda layers to squeeze it in there. Maybe use a different Chromium build that's lighter. And you get it on the Lambda. Great. It works. But it runs super slowly. It's because Lambdas are very like resource limited. They only run like with one vCPU. You can run one process at a time. Remember, Chromium is super beefy. It's barely running on my MacBook Air. I'm still downloading it from a pre-run. Yeah, from the test earlier, right? I'm joking. But it's big, you know? So like Lambda, it just won't work really well. Maybe it'll work, but you need something faster. Your users want something faster. Okay. Well, let's put it on a beefier instance. Let's get an EC2 server running. Let's throw Chromium on there. Great. Okay. I can, that works well with one user. But what if I want to run like 10 Chromium instances, one for each of my users? Okay. Well, I might need two EC2 instances. Maybe 10. All of a sudden, you have multiple EC2 instances. This sounds like a problem for Kubernetes and Docker, right? Now, all of a sudden, you're using ECS or EKS, the Kubernetes or container solutions by Amazon. You're spending up and down containers, and you're spending a whole engineer's time on kind of maintaining this stateful distributed system. Those are some of the worst systems to run because when it's a stateful distributed system, it means that you are bound by the connections to that thing. You have to keep the browser open while someone is working with it, right? That's just a painful architecture to run. And there's all this other little gotchas with Chromium, like Chromium, which is the open source version of Chrome, by the way. You have to install all these fonts. You want emojis working in your browsers because your vision model is looking for the emoji. You need to make sure you have the emoji fonts. You need to make sure you have all the right extensions configured, like, oh, do you want ad blocking? How do you configure that? How do you actually record all these browser sessions? Like it's a headless browser. You can't look at it. So you need to have some sort of observability. Maybe you're recording videos and storing those somewhere. It all kind of adds up to be this just giant monster piece of your project when all you wanted to do was run a lot of browsers in production for this little script to go to google.com and search. And when I see a complex distributed system, I see an opportunity to build a great infrastructure company. And we really abstract that away with Browserbase where our customers can use these existing frameworks, Playwright, Publisher, Selenium, or our own stagehand and connect to our browsers in a serverless-like way. And control them, and then just disconnect when they're done. And they don't have to think about the complex distributed system behind all of that. They just get a browser running anywhere, anytime. Really easy to connect to.swyx [00:15:55]: I'm sure you have questions. My standard question with anything, so essentially you're a serverless browser company, and there's been other serverless things that I'm familiar with in the past, serverless GPUs, serverless website hosting. That's where I come from with Netlify. One question is just like, you promised to spin up thousands of servers. You promised to spin up thousands of browsers in milliseconds. I feel like there's no real solution that does that yet. And I'm just kind of curious how. The only solution I know, which is to kind of keep a kind of warm pool of servers around, which is expensive, but maybe not so expensive because it's just CPUs. So I'm just like, you know. Yeah.Browsers as a Core Primitive in AI InfrastructurePaul [00:16:36]: You nailed it, right? I mean, how do you offer a serverless-like experience with something that is clearly not serverless, right? And the answer is, you need to be able to run... We run many browsers on single nodes. We use Kubernetes at browser base. So we have many pods that are being scheduled. We have to predictably schedule them up or down. Yes, thousands of browsers in milliseconds is the best case scenario. If you hit us with 10,000 requests, you may hit a slower cold start, right? So we've done a lot of work on predictive scaling and being able to kind of route stuff to different regions where we have multiple regions of browser base where we have different pools available. You can also pick the region you want to go to based on like lower latency, round trip, time latency. It's very important with these types of things. There's a lot of requests going over the wire. So for us, like having a VM like Firecracker powering everything under the hood allows us to be super nimble and spin things up or down really quickly with strong multi-tenancy. But in the end, this is like the complex infrastructural challenges that we have to kind of deal with at browser base. And we have a lot more stuff on our roadmap to allow customers to have more levers to pull to exchange, do you want really fast browser startup times or do you want really low costs? And if you're willing to be more flexible on that, we may be able to kind of like work better for your use cases.swyx [00:17:44]: Since you used Firecracker, shouldn't Fargate do that for you or did you have to go lower level than that? We had to go lower level than that.Paul [00:17:51]: I find this a lot with Fargate customers, which is alarming for Fargate. We used to be a giant Fargate customer. Actually, the first version of browser base was ECS and Fargate. And unfortunately, it's a great product. I think we were actually the largest Fargate customer in our region for a little while. No, what? Yeah, seriously. And unfortunately, it's a great product, but I think if you're an infrastructure company, you actually have to have a deeper level of control over these primitives. I think it's the same thing is true with databases. We've used other database providers and I think-swyx [00:18:21]: Yeah, serverless Postgres.Paul [00:18:23]: Shocker. When you're an infrastructure company, you're on the hook if any provider has an outage. And I can't tell my customers like, hey, we went down because so-and-so went down. That's not acceptable. So for us, we've really moved to bringing things internally. It's kind of opposite of what we preach. We tell our customers, don't build this in-house, but then we're like, we build a lot of stuff in-house. But I think it just really depends on what is in the critical path. We try and have deep ownership of that.Alessio [00:18:46]: On the distributed location side, how does that work for the web where you might get sort of different content in different locations, but the customer is expecting, you know, if you're in the US, I'm expecting the US version. But if you're spinning up my browser in France, I might get the French version. Yeah.Paul [00:19:02]: Yeah. That's a good question. Well, generally, like on the localization, there is a thing called locale in the browser. You can set like what your locale is. If you're like in the ENUS browser or not, but some things do IP, IP based routing. And in that case, you may want to have a proxy. Like let's say you're running something in the, in Europe, but you want to make sure you're showing up from the US. You may want to use one of our proxy features so you can turn on proxies to say like, make sure these connections always come from the United States, which is necessary too, because when you're browsing the web, you're coming from like a, you know, data center IP, and that can make things a lot harder to browse web. So we do have kind of like this proxy super network. Yeah. We have a proxy for you based on where you're going, so you can reliably automate the web. But if you get scheduled in Europe, that doesn't happen as much. We try and schedule you as close to, you know, your origin that you're trying to go to. But generally you have control over the regions you can put your browsers in. So you can specify West one or East one or Europe. We only have one region of Europe right now, actually. Yeah.Alessio [00:19:55]: What's harder, the browser or the proxy? I feel like to me, it feels like actually proxying reliably at scale. It's much harder than spending up browsers at scale. I'm curious. It's all hard.Paul [00:20:06]: It's layers of hard, right? Yeah. I think it's different levels of hard. I think the thing with the proxy infrastructure is that we work with many different web proxy providers and some are better than others. Some have good days, some have bad days. And our customers who've built browser infrastructure on their own, they have to go and deal with sketchy actors. Like first they figure out their own browser infrastructure and then they got to go buy a proxy. And it's like you can pay in Bitcoin and it just kind of feels a little sus, right? It's like you're buying drugs when you're trying to get a proxy online. We have like deep relationships with these counterparties. We're able to audit them and say, is this proxy being sourced ethically? Like it's not running on someone's TV somewhere. Is it free range? Yeah. Free range organic proxies, right? Right. We do a level of diligence. We're SOC 2. So we have to understand what is going on here. But then we're able to make sure that like we route around proxy providers not working. There's proxy providers who will just, the proxy will stop working all of a sudden. And then if you don't have redundant proxying on your own browsers, that's hard down for you or you may get some serious impacts there. With us, like we intelligently know, hey, this proxy is not working. Let's go to this one. And you can kind of build a network of multiple providers to really guarantee the best uptime for our customers. Yeah. So you don't own any proxies? We don't own any proxies. You're right. The team has been saying who wants to like take home a little proxy server, but not yet. We're not there yet. You know?swyx [00:21:25]: It's a very mature market. I don't think you should build that yourself. Like you should just be a super customer of them. Yeah. Scraping, I think, is the main use case for that. I guess. Well, that leads us into CAPTCHAs and also off, but let's talk about CAPTCHAs. You had a little spiel that you wanted to talk about CAPTCHA stuff.Challenges of Scaling Browser InfrastructurePaul [00:21:43]: Oh, yeah. I was just, I think a lot of people ask, if you're thinking about proxies, you're thinking about CAPTCHAs too. I think it's the same thing. You can go buy CAPTCHA solvers online, but it's the same buying experience. It's some sketchy website, you have to integrate it. It's not fun to buy these things and you can't really trust that the docs are bad. What Browserbase does is we integrate a bunch of different CAPTCHAs. We do some stuff in-house, but generally we just integrate with a bunch of known vendors and continually monitor and maintain these things and say, is this working or not? Can we route around it or not? These are CAPTCHA solvers. CAPTCHA solvers, yeah. Not CAPTCHA providers, CAPTCHA solvers. Yeah, sorry. CAPTCHA solvers. We really try and make sure all of that works for you. I think as a dev, if I'm buying infrastructure, I want it all to work all the time and it's important for us to provide that experience by making sure everything does work and monitoring it on our own. Yeah. Right now, the world of CAPTCHAs is tricky. I think AI agents in particular are very much ahead of the internet infrastructure. CAPTCHAs are designed to block all types of bots, but there are now good bots and bad bots. I think in the future, CAPTCHAs will be able to identify who a good bot is, hopefully via some sort of KYC. For us, we've been very lucky. We have very little to no known abuse of Browserbase because we really look into who we work with. And for certain types of CAPTCHA solving, we only allow them on certain types of plans because we want to make sure that we can know what people are doing, what their use cases are. And that's really allowed us to try and be an arbiter of good bots, which is our long term goal. I want to build great relationships with people like Cloudflare so we can agree, hey, here are these acceptable bots. We'll identify them for you and make sure we flag when they come to your website. This is a good bot, you know?Alessio [00:23:23]: I see. And Cloudflare said they want to do more of this. So they're going to set by default, if they think you're an AI bot, they're going to reject. I'm curious if you think this is something that is going to be at the browser level or I mean, the DNS level with Cloudflare seems more where it should belong. But I'm curious how you think about it.Paul [00:23:40]: I think the web's going to change. You know, I think that the Internet as we have it right now is going to change. And we all need to just accept that the cat is out of the bag. And instead of kind of like wishing the Internet was like it was in the 2000s, we can have free content line that wouldn't be scraped. It's just it's not going to happen. And instead, we should think about like, one, how can we change? How can we change the models of, you know, information being published online so people can adequately commercialize it? But two, how do we rebuild applications that expect that AI agents are going to log in on their behalf? Those are the things that are going to allow us to kind of like identify good and bad bots. And I think the team at Clerk has been doing a really good job with this on the authentication side. I actually think that auth is the biggest thing that will prevent agents from accessing stuff, not captchas. And I think there will be agent auth in the future. I don't know if it's going to happen from an individual company, but actually authentication providers that have a, you know, hidden login as agent feature, which will then you put in your email, you'll get a push notification, say like, hey, your browser-based agent wants to log into your Airbnb. You can approve that and then the agent can proceed. That really circumvents the need for captchas or logging in as you and sharing your password. I think agent auth is going to be one way we identify good bots going forward. And I think a lot of this captcha solving stuff is really short-term problems as the internet kind of reorients itself around how it's going to work with agents browsing the web, just like people do. Yeah.Managing Distributed Browser Locations and Proxiesswyx [00:24:59]: Stitch recently was on Hacker News for talking about agent experience, AX, which is a thing that Netlify is also trying to clone and coin and talk about. And we've talked about this on our previous episodes before in a sense that I actually think that's like maybe the only part of the tech stack that needs to be kind of reinvented for agents. Everything else can stay the same, CLIs, APIs, whatever. But auth, yeah, we need agent auth. And it's mostly like short-lived, like it should not, it should be a distinct, identity from the human, but paired. I almost think like in the same way that every social network should have your main profile and then your alt accounts or your Finsta, it's almost like, you know, every, every human token should be paired with the agent token and the agent token can go and do stuff on behalf of the human token, but not be presumed to be the human. Yeah.Paul [00:25:48]: It's like, it's, it's actually very similar to OAuth is what I'm thinking. And, you know, Thread from Stitch is an investor, Colin from Clerk, Octaventures, all investors in browser-based because like, I hope they solve this because they'll make browser-based submission more possible. So we don't have to overcome all these hurdles, but I think it will be an OAuth-like flow where an agent will ask to log in as you, you'll approve the scopes. Like it can book an apartment on Airbnb, but it can't like message anybody. And then, you know, the agent will have some sort of like role-based access control within an application. Yeah. I'm excited for that.swyx [00:26:16]: The tricky part is just, there's one, one layer of delegation here, which is like, you're authoring my user's user or something like that. I don't know if that's tricky or not. Does that make sense? Yeah.Paul [00:26:25]: You know, actually at Twilio, I worked on the login identity and access. Management teams, right? So like I built Twilio's login page.swyx [00:26:31]: You were an intern on that team and then you became the lead in two years? Yeah.Paul [00:26:34]: Yeah. I started as an intern in 2016 and then I was the tech lead of that team. How? That's not normal. I didn't have a life. He's not normal. Look at this guy. I didn't have a girlfriend. I just loved my job. I don't know. I applied to 500 internships for my first job and I got rejected from every single one of them except for Twilio and then eventually Amazon. And they took a shot on me and like, I was getting paid money to write code, which was my dream. Yeah. Yeah. I'm very lucky that like this coding thing worked out because I was going to be doing it regardless. And yeah, I was able to kind of spend a lot of time on a team that was growing at a company that was growing. So it informed a lot of this stuff here. I think these are problems that have been solved with like the SAML protocol with SSO. I think it's a really interesting stuff with like WebAuthn, like these different types of authentication, like schemes that you can use to authenticate people. The tooling is all there. It just needs to be tweaked a little bit to work for agents. And I think the fact that there are companies that are already. Providing authentication as a service really sets it up. Well, the thing that's hard is like reinventing the internet for agents. We don't want to rebuild the internet. That's an impossible task. And I think people often say like, well, we'll have this second layer of APIs built for agents. I'm like, we will for the top use cases, but instead of we can just tweak the internet as is, which is on the authentication side, I think we're going to be the dumb ones going forward. Unfortunately, I think AI is going to be able to do a lot of the tasks that we do online, which means that it will be able to go to websites, click buttons on our behalf and log in on our behalf too. So with this kind of like web agent future happening, I think with some small structural changes, like you said, it feels like it could all slot in really nicely with the existing internet.Handling CAPTCHAs and Agent Authenticationswyx [00:28:08]: There's one more thing, which is the, your live view iframe, which lets you take, take control. Yeah. Obviously very key for operator now, but like, was, is there anything interesting technically there or that the people like, well, people always want this.Paul [00:28:21]: It was really hard to build, you know, like, so, okay. Headless browsers, you don't see them, right. They're running. They're running in a cloud somewhere. You can't like look at them. And I just want to really make, it's a weird name. I wish we came up with a better name for this thing, but you can't see them. Right. But customers don't trust AI agents, right. At least the first pass. So what we do with our live view is that, you know, when you use browser base, you can actually embed a live view of the browser running in the cloud for your customer to see it working. And that's what the first reason is the build trust, like, okay, so I have this script. That's going to go automate a website. I can embed it into my web application via an iframe and my customer can watch. I think. And then we added two way communication. So now not only can you watch the browser kind of being operated by AI, if you want to pause and actually click around type within this iframe that's controlling a browser, that's also possible. And this is all thanks to some of the lower level protocol, which is called the Chrome DevTools protocol. It has a API called start screencast, and you can also send mouse clicks and button clicks to a remote browser. And this is all embeddable within iframes. You have a browser within a browser, yo. And then you simulate the screen, the click on the other side. Exactly. And this is really nice often for, like, let's say, a capture that can't be solved. You saw this with Operator, you know, Operator actually uses a different approach. They use VNC. So, you know, you're able to see, like, you're seeing the whole window here. What we're doing is something a little lower level with the Chrome DevTools protocol. It's just PNGs being streamed over the wire. But the same thing is true, right? Like, hey, I'm running a window. Pause. Can you do something in this window? Human. Okay, great. Resume. Like sometimes 2FA tokens. Like if you get that text message, you might need a person to type that in. Web agents need human-in-the-loop type workflows still. You still need a person to interact with the browser. And building a UI to proxy that is kind of hard. You may as well just show them the whole browser and say, hey, can you finish this up for me? And then let the AI proceed on afterwards. Is there a future where I stream my current desktop to browser base? I don't think so. I think we're very much cloud infrastructure. Yeah. You know, but I think a lot of the stuff we're doing, we do want to, like, build tools. Like, you know, we'll talk about the stage and, you know, web agent framework in a second. But, like, there's a case where a lot of people are going desktop first for, you know, consumer use. And I think cloud is doing a lot of this, where I expect to see, you know, MCPs really oriented around the cloud desktop app for a reason, right? Like, I think a lot of these tools are going to run on your computer because it makes... I think it's breaking out. People are putting it on a server. Oh, really? Okay. Well, sweet. We'll see. We'll see that. I was surprised, though, wasn't I? I think that the browser company, too, with Dia Browser, it runs on your machine. You know, it's going to be...swyx [00:30:50]: What is it?Paul [00:30:51]: So, Dia Browser, as far as I understand... I used to use Arc. Yeah. I haven't used Arc. But I'm a big fan of the browser company. I think they're doing a lot of cool stuff in consumer. As far as I understand, it's a browser where you have a sidebar where you can, like, chat with it and it can control the local browser on your machine. So, if you imagine, like, what a consumer web agent is, which it lives alongside your browser, I think Google Chrome has Project Marina, I think. I almost call it Project Marinara for some reason. I don't know why. It's...swyx [00:31:17]: No, I think it's someone really likes the Waterworld. Oh, I see. The classic Kevin Costner. Yeah.Paul [00:31:22]: Okay. Project Marinara is a similar thing to the Dia Browser, in my mind, as far as I understand it. You have a browser that has an AI interface that will take over your mouse and keyboard and control the browser for you. Great for consumer use cases. But if you're building applications that rely on a browser and it's more part of a greater, like, AI app experience, you probably need something that's more like infrastructure, not a consumer app.swyx [00:31:44]: Just because I have explored a little bit in this area, do people want branching? So, I have the state. Of whatever my browser's in. And then I want, like, 100 clones of this state. Do people do that? Or...Paul [00:31:56]: People don't do it currently. Yeah. But it's definitely something we're thinking about. I think the idea of forking a browser is really cool. Technically, kind of hard. We're starting to see this in code execution, where people are, like, forking some, like, code execution, like, processes or forking some tool calls or branching tool calls. Haven't seen it at the browser level yet. But it makes sense. Like, if an AI agent is, like, using a website and it's not sure what path it wants to take to crawl this website. To find the information it's looking for. It would make sense for it to explore both paths in parallel. And that'd be a very, like... A road not taken. Yeah. And hopefully find the right answer. And then say, okay, this was actually the right one. And memorize that. And go there in the future. On the roadmap. For sure. Don't make my roadmap, please. You know?Alessio [00:32:37]: How do you actually do that? Yeah. How do you fork? I feel like the browser is so stateful for so many things.swyx [00:32:42]: Serialize the state. Restore the state. I don't know.Paul [00:32:44]: So, it's one of the reasons why we haven't done it yet. It's hard. You know? Like, to truly fork, it's actually quite difficult. The naive way is to open the same page in a new tab and then, like, hope that it's at the same thing. But if you have a form halfway filled, you may have to, like, take the whole, you know, container. Pause it. All the memory. Duplicate it. Restart it from there. It could be very slow. So, we haven't found a thing. Like, the easy thing to fork is just, like, copy the page object. You know? But I think there needs to be something a little bit more robust there. Yeah.swyx [00:33:12]: So, MorphLabs has this infinite branch thing. Like, wrote a custom fork of Linux or something that let them save the system state and clone it. MorphLabs, hit me up. I'll be a customer. Yeah. That's the only. I think that's the only way to do it. Yeah. Like, unless Chrome has some special API for you. Yeah.Paul [00:33:29]: There's probably something we'll reverse engineer one day. I don't know. Yeah.Alessio [00:33:32]: Let's talk about StageHand, the AI web browsing framework. You have three core components, Observe, Extract, and Act. Pretty clean landing page. What was the idea behind making a framework? Yeah.Stagehand: AI web browsing frameworkPaul [00:33:43]: So, there's three frameworks that are very popular or already exist, right? Puppeteer, Playwright, Selenium. Those are for building hard-coded scripts to control websites. And as soon as I started to play with LLMs plus browsing, I caught myself, you know, code-genning Playwright code to control a website. I would, like, take the DOM. I'd pass it to an LLM. I'd say, can you generate the Playwright code to click the appropriate button here? And it would do that. And I was like, this really should be part of the frameworks themselves. And I became really obsessed with SDKs that take natural language as part of, like, the API input. And that's what StageHand is. StageHand exposes three APIs, and it's a super set of Playwright. So, if you go to a page, you may want to take an action, click on the button, fill in the form, etc. That's what the act command is for. You may want to extract some data. This one takes a natural language, like, extract the winner of the Super Bowl from this page. You can give it a Zod schema, so it returns a structured output. And then maybe you're building an API. You can do an agent loop, and you want to kind of see what actions are possible on this page before taking one. You can do observe. So, you can observe the actions on the page, and it will generate a list of actions. You can guide it, like, give me actions on this page related to buying an item. And you can, like, buy it now, add to cart, view shipping options, and pass that to an LLM, an agent loop, to say, what's the appropriate action given this high-level goal? So, StageHand isn't a web agent. It's a framework for building web agents. And we think that agent loops are actually pretty close to the application layer because every application probably has different goals or different ways it wants to take steps. I don't think I've seen a generic. Maybe you guys are the experts here. I haven't seen, like, a really good AI agent framework here. Everyone kind of has their own special sauce, right? I see a lot of developers building their own agent loops, and they're using tools. And I view StageHand as the browser tool. So, we expose act, extract, observe. Your agent can call these tools. And from that, you don't have to worry about it. You don't have to worry about generating playwright code performantly. You don't have to worry about running it. You can kind of just integrate these three tool calls into your agent loop and reliably automate the web.swyx [00:35:48]: A special shout-out to Anirudh, who I met at your dinner, who I think listens to the pod. Yeah. Hey, Anirudh.Paul [00:35:54]: Anirudh's a man. He's a StageHand guy.swyx [00:35:56]: I mean, the interesting thing about each of these APIs is they're kind of each startup. Like, specifically extract, you know, Firecrawler is extract. There's, like, Expand AI. There's a whole bunch of, like, extract companies. They just focus on extract. I'm curious. Like, I feel like you guys are going to collide at some point. Like, right now, it's friendly. Everyone's in a blue ocean. At some point, it's going to be valuable enough that there's some turf battle here. I don't think you have a dog in a fight. I think you can mock extract to use an external service if they're better at it than you. But it's just an observation that, like, in the same way that I see each option, each checkbox in the side of custom GBTs becoming a startup or each box in the Karpathy chart being a startup. Like, this is also becoming a thing. Yeah.Paul [00:36:41]: I mean, like, so the way StageHand works is that it's MIT-licensed, completely open source. You bring your own API key to your LLM of choice. You could choose your LLM. We don't make any money off of the extract or really. We only really make money if you choose to run it with our browser. You don't have to. You can actually use your own browser, a local browser. You know, StageHand is completely open source for that reason. And, yeah, like, I think if you're building really complex web scraping workflows, I don't know if StageHand is the tool for you. I think it's really more if you're building an AI agent that needs a few general tools or if it's doing a lot of, like, web automation-intensive work. But if you're building a scraping company, StageHand is not your thing. You probably want something that's going to, like, get HTML content, you know, convert that to Markdown, query it. That's not what StageHand does. StageHand is more about reliability. I think we focus a lot on reliability and less so on cost optimization and speed at this point.swyx [00:37:33]: I actually feel like StageHand, so the way that StageHand works, it's like, you know, page.act, click on the quick start. Yeah. It's kind of the integration test for the code that you would have to write anyway, like the Puppeteer code that you have to write anyway. And when the page structure changes, because it always does, then this is still the test. This is still the test that I would have to write. Yeah. So it's kind of like a testing framework that doesn't need implementation detail.Paul [00:37:56]: Well, yeah. I mean, Puppeteer, Playwright, and Slenderman were all designed as testing frameworks, right? Yeah. And now people are, like, hacking them together to automate the web. I would say, and, like, maybe this is, like, me being too specific. But, like, when I write tests, if the page structure changes. Without me knowing, I want that test to fail. So I don't know if, like, AI, like, regenerating that. Like, people are using StageHand for testing. But it's more for, like, usability testing, not, like, testing of, like, does the front end, like, has it changed or not. Okay. But generally where we've seen people, like, really, like, take off is, like, if they're using, you know, something. If they want to build a feature in their application that's kind of like Operator or Deep Research, they're using StageHand to kind of power that tool calling in their own agent loop. Okay. Cool.swyx [00:38:37]: So let's go into Operator, the first big agent launch of the year from OpenAI. Seems like they have a whole bunch scheduled. You were on break and your phone blew up. What's your just general view of computer use agents is what they're calling it. The overall category before we go into Open Operator, just the overall promise of Operator. I will observe that I tried it once. It was okay. And I never tried it again.OpenAI's Operator and computer use agentsPaul [00:38:58]: That tracks with my experience, too. Like, I'm a huge fan of the OpenAI team. Like, I think that I do not view Operator as the company. I'm not a company killer for browser base at all. I think it actually shows people what's possible. I think, like, computer use models make a lot of sense. And I'm actually most excited about computer use models is, like, their ability to, like, really take screenshots and reasoning and output steps. I think that using mouse click or mouse coordinates, I've seen that proved to be less reliable than I would like. And I just wonder if that's the right form factor. What we've done with our framework is anchor it to the DOM itself, anchor it to the actual item. So, like, if it's clicking on something, it's clicking on that thing, you know? Like, it's more accurate. No matter where it is. Yeah, exactly. Because it really ties in nicely. And it can handle, like, the whole viewport in one go, whereas, like, Operator can only handle what it sees. Can you hover? Is hovering a thing that you can do? I don't know if we expose it as a tool directly, but I'm sure there's, like, an API for hovering. Like, move mouse to this position. Yeah, yeah, yeah. I think you can trigger hover, like, via, like, the JavaScript on the DOM itself. But, no, I think, like, when we saw computer use, everyone's eyes lit up because they realized, like, wow, like, AI is going to actually automate work for people. And I think seeing that kind of happen from both of the labs, and I'm sure we're going to see more labs launch computer use models, I'm excited to see all the stuff that people build with it. I think that I'd love to see computer use power, like, controlling a browser on browser base. And I think, like, Open Operator, which was, like, our open source version of OpenAI's Operator, was our first take on, like, how can we integrate these models into browser base? And we handle the infrastructure and let the labs do the models. I don't have a sense that Operator will be released as an API. I don't know. Maybe it will. I'm curious to see how well that works because I think it's going to be really hard for a company like OpenAI to do things like support CAPTCHA solving or, like, have proxies. Like, I think it's hard for them structurally. Imagine this New York Times headline, OpenAI CAPTCHA solving. Like, that would be a pretty bad headline, this New York Times headline. Browser base solves CAPTCHAs. No one cares. No one cares. And, like, our investors are bored. Like, we're all okay with this, you know? We're building this company knowing that the CAPTCHA solving is short-lived until we figure out how to authenticate good bots. I think it's really hard for a company like OpenAI, who has this brand that's so, so good, to balance with, like, the icky parts of web automation, which it can be kind of complex to solve. I'm sure OpenAI knows who to call whenever they need you. Yeah, right. I'm sure they'll have a great partnership.Alessio [00:41:23]: And is Open Operator just, like, a marketing thing for you? Like, how do you think about resource allocation? So, you can spin this up very quickly. And now there's all this, like, open deep research, just open all these things that people are building. We started it, you know. You're the original Open. We're the original Open operator, you know? Is it just, hey, look, this is a demo, but, like, we'll help you build out an actual product for yourself? Like, are you interested in going more of a product route? That's kind of the OpenAI way, right? They started as a model provider and then…Paul [00:41:53]: Yeah, we're not interested in going the product route yet. I view Open Operator as a model provider. It's a reference project, you know? Let's show people how to build these things using the infrastructure and models that are out there. And that's what it is. It's, like, Open Operator is very simple. It's an agent loop. It says, like, take a high-level goal, break it down into steps, use tool calling to accomplish those steps. It takes screenshots and feeds those screenshots into an LLM with the step to generate the right action. It uses stagehand under the hood to actually execute this action. It doesn't use a computer use model. And it, like, has a nice interface using the live view that we talked about, the iframe, to embed that into an application. So I felt like people on launch day wanted to figure out how to build their own version of this. And we turned that around really quickly to show them. And I hope we do that with other things like deep research. We don't have a deep research launch yet. I think David from AOMNI actually has an amazing open deep research that he launched. It has, like, 10K GitHub stars now. So he's crushing that. But I think if people want to build these features natively into their application, they need good reference projects. And I think Open Operator is a good example of that.swyx [00:42:52]: I don't know. Actually, I'm actually pretty bullish on API-driven operator. Because that's the only way that you can sort of, like, once it's reliable enough, obviously. And now we're nowhere near. But, like, give it five years. It'll happen, you know. And then you can sort of spin this up and browsers are working in the background and you don't necessarily have to know. And it just is booking restaurants for you, whatever. I can definitely see that future happening. I had this on the landing page here. This might be a slightly out of order. But, you know, you have, like, sort of three use cases for browser base. Open Operator. Or this is the operator sort of use case. It's kind of like the workflow automation use case. And it completes with UiPath in the sort of RPA category. Would you agree with that? Yeah, I would agree with that. And then there's Agents we talked about already. And web scraping, which I imagine would be the bulk of your workload right now, right?Paul [00:43:40]: No, not at all. I'd say actually, like, the majority is browser automation. We're kind of expensive for web scraping. Like, I think that if you're building a web scraping product, if you need to do occasional web scraping or you have to do web scraping that works every single time, you want to use browser automation. Yeah. You want to use browser-based. But if you're building web scraping workflows, what you should do is have a waterfall. You should have the first request is a curl to the website. See if you can get it without even using a browser. And then the second request may be, like, a scraping-specific API. There's, like, a thousand scraping APIs out there that you can use to try and get data. Scraping B. Scraping B is a great example, right? Yeah. And then, like, if those two don't work, bring out the heavy hitter. Like, browser-based will 100% work, right? It will load the page in a real browser, hydrate it. I see.swyx [00:44:21]: Because a lot of people don't render to JS.swyx [00:44:25]: Yeah, exactly.Paul [00:44:26]: So, I mean, the three big use cases, right? Like, you know, automation, web data collection, and then, you know, if you're building anything agentic that needs, like, a browser tool, you want to use browser-based.Alessio [00:44:35]: Is there any use case that, like, you were super surprised by that people might not even think about? Oh, yeah. Or is it, yeah, anything that you can share? The long tail is crazy. Yeah.Surprising use cases of BrowserbasePaul [00:44:44]: One of the case studies on our website that I think is the most interesting is this company called Benny. So, the way that it works is if you're on food stamps in the United States, you can actually get rebates if you buy certain things. Yeah. You buy some vegetables. You submit your receipt to the government. They'll give you a little rebate back. Say, hey, thanks for buying vegetables. It's good for you. That process of submitting that receipt is very painful. And the way Benny works is you use their app to take a photo of your receipt, and then Benny will go submit that receipt for you and then deposit the money into your account. That's actually using no AI at all. It's all, like, hard-coded scripts. They maintain the scripts. They've been doing a great job. And they build this amazing consumer app. But it's an example of, like, all these, like, tedious workflows that people have to do to kind of go about their business. And they're doing it for the sake of their day-to-day lives. And I had never known about, like, food stamp rebates or the complex forms you have to do to fill them. But the world is powered by millions and millions of tedious forms, visas. You know, Emirate Lighthouse is a customer, right? You know, they do the O1 visa. Millions and millions of forms are taking away humans' time. And I hope that Browserbase can help power software that automates away the web forms that we don't need anymore. Yeah.swyx [00:45:49]: I mean, I'm very supportive of that. I mean, forms. I do think, like, government itself is a big part of it. I think the government itself should embrace AI more to do more sort of human-friendly form filling. Mm-hmm. But I'm not optimistic. I'm not holding my breath. Yeah. We'll see. Okay. I think I'm about to zoom out. I have a little brief thing on computer use, and then we can talk about founder stuff, which is, I tend to think of developer tooling markets in impossible triangles, where everyone starts in a niche, and then they start to branch out. So I already hinted at a little bit of this, right? We mentioned more. We mentioned E2B. We mentioned Firecrawl. And then there's Browserbase. So there's, like, all this stuff of, like, have serverless virtual computer that you give to an agent and let them do stuff with it. And there's various ways of connecting it to the internet. You can just connect to a search API, like SERP API, whatever other, like, EXA is another one. That's what you're searching. You can also have a JSON markdown extractor, which is Firecrawl. Or you can have a virtual browser like Browserbase, or you can have a virtual machine like Morph. And then there's also maybe, like, a virtual sort of code environment, like Code Interpreter. So, like, there's just, like, a bunch of different ways to tackle the problem of give a computer to an agent. And I'm just kind of wondering if you see, like, everyone's just, like, happily coexisting in their respective niches. And as a developer, I just go and pick, like, a shopping basket of one of each. Or do you think that you eventually, people will collide?Future of browser automation and market competitionPaul [00:47:18]: I think that currently it's not a zero-sum market. Like, I think we're talking about... I think we're talking about all of knowledge work that people do that can be automated online. All of these, like, trillions of hours that happen online where people are working. And I think that there's so much software to be built that, like, I tend not to think about how these companies will collide. I just try to solve the problem as best as I can and make this specific piece of infrastructure, which I think is an important primitive, the best I possibly can. And yeah. I think there's players that are actually going to like it. I think there's players that are going to launch, like, over-the-top, you know, platforms, like agent platforms that have all these tools built in, right? Like, who's building the rippling for agent tools that has the search tool, the browser tool, the operating system tool, right? There are some. There are some. There are some, right? And I think in the end, what I have seen as my time as a developer, and I look at all the favorite tools that I have, is that, like, for tools and primitives with sufficient levels of complexity, you need to have a solution that's really bespoke to that primitive, you know? And I am sufficiently convinced that the browser is complex enough to deserve a primitive. Obviously, I have to. I'm the founder of BrowserBase, right? I'm talking my book. But, like, I think maybe I can give you one spicy take against, like, maybe just whole OS running. I think that when I look at computer use when it first came out, I saw that the majority of use cases for computer use were controlling a browser. And do we really need to run an entire operating system just to control a browser? I don't think so. I don't think that's necessary. You know, BrowserBase can run browsers for way cheaper than you can if you're running a full-fledged OS with a GUI, you know, operating system. And I think that's just an advantage of the browser. It is, like, browsers are little OSs, and you can run them very efficiently if you orchestrate it well. And I think that allows us to offer 90% of the, you know, functionality in the platform needed at 10% of the cost of running a full OS. Yeah.Open Operator: Browserbase's Open-Source Alternativeswyx [00:49:16]: I definitely see the logic in that. There's a Mark Andreessen quote. I don't know if you know this one. Where he basically observed that the browser is turning the operating system into a poorly debugged set of device drivers, because most of the apps are moved from the OS to the browser. So you can just run browsers.Paul [00:49:31]: There's a place for OSs, too. Like, I think that there are some applications that only run on Windows operating systems. And Eric from pig.dev in this upcoming YC batch, or last YC batch, like, he's building all run tons of Windows operating systems for you to control with your agent. And like, there's some legacy EHR systems that only run on Internet-controlled systems. Yeah.Paul [00:49:54]: I think that's it. I think, like, there are use cases for specific operating systems for specific legacy software. And like, I'm excited to see what he does with that. I just wanted to give a shout out to the pig.dev website.swyx [00:50:06]: The pigs jump when you click on them. Yeah. That's great.Paul [00:50:08]: Eric, he's the former co-founder of banana.dev, too.swyx [00:50:11]: Oh, that Eric. Yeah. That Eric. Okay. Well, he abandoned bananas for pigs. I hope he doesn't start going around with pigs now.Alessio [00:50:18]: Like he was going around with bananas. A little toy pig. Yeah. Yeah. I love that. What else are we missing? I think we covered a lot of, like, the browser-based product history, but. What do you wish people asked you? Yeah.Paul [00:50:29]: I wish people asked me more about, like, what will the future of software look like? Because I think that's really where I've spent a lot of time about why do browser-based. Like, for me, starting a company is like a means of last resort. Like, you shouldn't start a company unless you absolutely have to. And I remain convinced that the future of software is software that you're going to click a button and it's going to do stuff on your behalf. Right now, software. You click a button and it maybe, like, calls it back an API and, like, computes some numbers. It, like, modifies some text, whatever. But the future of software is software using software. So, I may log into my accounting website for my business, click a button, and it's going to go load up my Gmail, search my emails, find the thing, upload the receipt, and then comment it for me. Right? And it may use it using APIs, maybe a browser. I don't know. I think it's a little bit of both. But that's completely different from how we've built software so far. And that's. I think that future of software has different infrastructure requirements. It's going to require different UIs. It's going to require different pieces of infrastructure. I think the browser infrastructure is one piece that fits into that, along with all the other categories you mentioned. So, I think that it's going to require developers to think differently about how they've built software for, you know, application level so far. And I am excited to kind of explore more what that means. And I think we've seen from, like, you know, the customers that use Browsway so far, some really innovative ways to, like, take software and really read it. And I think, like, re-imagine it for AI and build things that, like, have chat interfaces, build things that have human loop flows, build things that are more asynchronous because AI is slower. And those are patterns that are still emerging. And I don't think we have all the best practices yet.Key Use Cases for Browserbase: Automation, Agents, and Scrapingswyx [00:52:03]: I don't have much feedback on that. Like, that's true. Paul's right. Paul's right. You heard it here first. Quoted by Swyx. Yeah. Amazing. I'm framing that. It is not specific enough to be wrong.Paul [00:52:12]: That means Paul's right to me still.swyx [00:52:14]: I don't know if I'm hearing that wrong. I always try to prompt people for falsifiable problems. I think I'm just trying to make sure that I'm not making false predictions. Because, like, you can predict that things will be better generically, but how? And, like, those are the things where you, like, put a little skin in the game where…Paul [00:52:28]: Yeah. I mean, I can predict that Browsways will be a billion dollar company one day. So let's check back in five years and, you know, if I'm a PM at Coinbase, then something went wrong. Oh, boy.swyx [00:52:40]: Yeah. Yeah. We picked out a couple of your tweets about Foundry. Yeah. I think you're a pretty building public kind of guy. Yeah. I try to be. I think the main thing that I want to highlight as well is, you emphasized this at the start of your intro, which is you're a solo founder. I think that there's a movement towards more solo founders in the Valley more generally, but people who are hearing this for the first time have no idea. They're like, what do you mean? YC forces me to get a co-founder. Like, what is this? So I've heard you talk about this before, but maybe you want to recap your spiel for folks that haven't heard about it. Yeah. Yeah.Being a solo founderPaul [00:53:11]: I mean, I've had co-founders in my past company. I love my co-founders. They're my wedding. I think if you want to move extremely fast as a company, one of the hard parts about having co-founders is that there's like, you have to do the co-founder alignment and then the company alignment. And then there's people on the team that probably tell things to one co-founder because they have a favorite. And then like that co-founders represent their interests. Matt Brasway is a benevolent dictatorship. You know, like if I want to make a change, I work with the team and we all decide together. We move quickly. We don't have an extra layer of buy-in within the co-founder layer. Yeah. And frankly, like I think, especially with DevTools companies, if you're able to talk about your product and talk with customers and you can build product, you don't need to have a business guy or a business side. You know, I'm a developer first and foremost. I was raised by two salespeople, so I guess that's why I can talk to customers or something. But at my core- What kind of sales? I love, they did semiconductor and pharmaceutical sales. My mom and dad. Oh, very different. Yeah. Very different.swyx [00:54:08]: But also very enterprise. Good. Yeah.Paul [00:54:10]: Yeah. Yeah. Yeah. Yeah. I mean, like, it rubbed off on me in some way. I was just trying to play WoW as a kid and they made me play sports. So I don't know how it worked out the way it did, but it does all come back to like, as a solo founder, you need to be willing to like go out there and, you know, talk about your product, go talk to customers, go convince people to work for you, but then also have core principles of like how you want to build this company and like what product you want to build. And thankfully, if you can do all of that, you can be a solo founder. You just have to hire fast and put the right team around you. Yeah. And that's kind of the team that we do that's surrounding me and kind of lifting the whole company up.Alessio [00:54:44]: So there's kind of like the decision making and then there's like the culture of a company. Obviously as a solo founder, you have huge influence on everybody. Apple is maybe the usual example of like, you know, you have the Jobs and Wozniak. None of like, you can have two co-founders that are like each polarizing.Unexpected Use Cases of Browser Automationswyx [00:55:01]: There was a third co-founder, by the way.Alessio [00:55:02]: Who was the third co-founder?swyx [00:55:03]: I don't know. He sold his chairs very early on. Nobody talks about him, but he's like, he always has a, has a bit of a regret.Alessio [00:55:10]: Okay. But anyway. Yeah. How have you thought about building the culture? You know, obviously startups are like super intense, but you're also going to just run yourself to the ground all the time. Any insight doing it solo? Yeah.Paul [00:55:21]: I mean, like I talked about like how it's easier for me to make decisions being a solo founder. The real cheat code is like having a great team that you give a lot of agency and ownership to. A lot of people make the little tiny decisions that go into everything that makes Biospace great. Like the website, for example, I, I had some, like some involvement with that, but like a lot of that was the team. Right. And then the product. I think the team really has ownership of all, a lot of these day-to-day decisions that add up to make a cohesive product experience culturally, like we're fully in person. Maybe that's one crazy take that we do, but we're also like not too in person. Like our first meetings at 10 AM, people leave around five or six. We work Monday to Friday in person and those like, that's the, the expectation, right? I think people have gone too far with in-person where they're like seven days a week in the office, 9 AM to 9 PM.swyx [00:56:10]: That's too much. Just an anecdote. Yeah. I just visited an office. I'll keep them anonymous for now, but to my face, we are 9, 9, 6. Yeah. For those who don't know, 9, 9, 6 is 9 AM to 9 PM, six days a week.Paul [00:56:20]: I think we've taken it a little too far and for some teams, I know another anonymous company that does something like 9, 9, 6 and they're like crushing it right now. Right. So like, and like, it does get results, but like, I think for our culture, we gather in person, we put pants on every day and go to the office so that we can all work together. Or shorts, I guess. Right. And then like, we all know we're going to work outside of, out of the office. We're going to work at home sometimes. We might come in on a weekend. The weekends are for fun work and that's really where we get to let people work on stuff that's not on the roadmap. And that empowers them to build something and bring it back to the team on Monday and say, look what I built. This is cool. Culturally, we're a lot of like former YCCTOs and like ex-founders or future founders. And I've just found that those people tend to be just really great early hires for a company. They, they get it. And I think for them, especially kind of the ex-YCCTOs. I see people who maybe didn't find PMF coming in and being at a company with PMF, it's such a refreshing thing for them because they can just come in and execute. And there's just so many clear things we have to go build. And if you're a talented engineer, being able to go build and make an impact every single day is like super fulfilling.swyx [00:57:25]: My question on the other hand is you also talk a lot about recruiting, especially in the podcast that you talk about. How come there's no browser-based recruiting agent? That's a good question.Paul [00:57:34]: I think it's because I don't do that much outbound. I do message people. Yeah. But a lot of it's now through referral. It's very like targeted. Like if I see somebody working on something really cool, I just message them. So I don't want like something trawling the web and like messaging every Kubernetes firecracker expert. I try and like look for them in my passive web browsing. And when I find somebody, I just want to like take the time personally, like say, Hey, I love what you're doing. I think it's really cool. And let's have a conversation. Yeah.swyx [00:58:03]: Off of Hacker News and other stuff. Yeah.Paul [00:58:05]: I love to hire off of Hacker News. Yeah.swyx [00:58:07]: Let you plug at the end. My attempt at this failed, which is I really hate LinkedIn Sales Navigator. I think that it is just grifting on top of people doing data entry for LinkedIn. And I hope that browser-based will someday help to kill LinkedIn Sales Navigator at this point.Paul [00:58:21]: I don't know if we will directly, but one of our customers definitely is trying to do that. So I think there's a couple that are on it. These AISDR companies are crushing it. Yup.Alessio [00:58:30]: The 996 company was an AISDR company.swyx [00:58:33]: There we go.Alessio [00:58:34]: Yeah. Very classic. This was great. Anything? Yeah. You got the run clubs too. What other things do you mix in, like both in the company culture and like the community culture? I know you bring people together. Yeah.Paul [00:58:45]: I think like we, like we try and build in public and like, like you can see a lot of the browser based people on Twitter. Every Monday we have a run club. People go running together. We don't run very fast, but it's like a good way to spend time together. I just look back fondly on my time being in person at my first company. And we have people like with a mix of people like are just early in career. People have been in the business for a long time. They've been in, you know, the workforce for 20, 30 years. So it's not just like a young people company, like it's a huge mix. But when you make people make a polarizing decision of like, I will come to an office five days a week, people then end up making more decisions that are aligned with a culture. So it's almost like if you can make your culture binary or you're in or out, it becomes easier to assimilate and like keep a cohesive culture. And I think it starts with being an office for us, but for other people it could be like moving or like using discord versus slack or like other like. Yeah. The, the binary decisions that people may have to make.swyx [00:59:36]: One thing I like asking founders is, you know, you're famously not an AI company or, you know, you, you serve AI companies, but you're not yourself a LLM sort of consuming company. But if you were though, what company would you start? What's what's like obviously a good idea.The Competitive Landscape of AI-Powered Browsing and AutomationPaul [00:59:50]: Yeah, I, I had this tweet like forever ago, which is like, there's so much money to be made in taking like proprietary research and then turning that into like an automation, which is obviously like a very like browser based inspired one. Like. Like listening to all the city halls or town hall meetings in like little towns and then knowing when they're going to like approve a new Walmart or something and then like buying up real estate around the Walmart because that will go up when they install this thing. So it's like really interesting to think about like how can you find new channels for data that will allow you to make like high alpha decisions and benefit you financially. So I think it's like some interesting stuff there, like just a bunch of conversations that happen in real life that are recorded, that are online, that you can go find using, you know, a web browser, of course. And then like making some interesting like decisions off of that. So I don't know, like I like browser stuff, like it's on brand, right? Like I have to, I'm consistent at least.Alessio [01:00:45]: Do not look at it on your phone through a native app, only look at it through the browser.swyx [01:00:49]: My favorite part of one of his videos, they had these guys holding this bee behind them while they were doing the demo. So it was like a really Easter egg. Yeah, that was stagehand, right?Paul [01:00:58]: Yeah, the stagehand video. It's not, they're not holding it. They're actually wearing these bee boxes on their heads. And we shot it like five times and poor Sean and Samil are like bobbing their heads back and forth with these bee boxes on because we can't afford special effects, man. It's really serious.swyx [01:01:13]: Good detail. Good effort detail there. Yeah. Thank you so much. Congrats on all your success.Paul [01:01:17]: Thanks for having me, guys. It's been a really good time.swyx [01:01:20]: Yeah, I'm sure we'll have you back again.Paul [01:01:21]: Yeah, I'd love to come back. Get full access to Latent.Space at www.latent.space/subscribe
    --------  
    1:01:33
  • The Inventors of Deep Research
    While “LLM-powered Search” is as old as Perplexity and SearchGPT, and open source projects like GPTResearcher and clones like OpenDeepResearch exist, the difference with “Deep Research” products is they are both “agentic” (loosely meaning that an LLM decides the next step in a workflow, usually involving tools) and bundling custom-tuned frontier models (custom tuned o3 and Gemini 1.5 Flash).The reception to OpenAI’s Deep Research agent has been nothing short of breathless:"Deep Research is the best public-facing AI product Google has ever released. It's like having a college-educated researcher in your pocket." - Jason Calacanis“I have had [Deep Research] write a number of ten-page papers for me, each of them outstanding. I think of the quality as comparable to having a good PhD-level research assistant, and sending that person away with a task for a week or two, or maybe more. Except Deep Research does the work in five or six minutes.” - Tyler Cowen“Deep Research is one of the best bargains in technology.” - Ben Thompson“my very approximate vibe is that it can do a single-digit percentage of all economically valuable tasks in the world, which is a wild milestone.” - sama“Using Deep Research over the past few weeks has been my own personal AGI moment. It takes 10 mins to generate accurate and thorough competitive and market research (with sources) that previously used to take me at least 3 hours.” - OAI employee“It's like a bazooka for the curious mind” - Dan Shipper“Deep research can be seen as a new interface for the internet, in addition to being an incredible agent… This paradigm will be so powerful that in the future, navigating the internet manually via a browser will be "old-school", like performing arithmetic calculations by hand.” - Jason Wei“One notable characteristic of Deep Research is its extreme patience. I think this is rapidly approaching “superhuman patience”. One realization working on this project was that intelligence and patience go really well together.” - HyungWon“I asked it to write a reference Interaction Calculus evaluator in Haskell. A few exchanges later, it gave me a complete file, including a parser, an evaluator, O(1) interactions and everything. The file compiled, and worked on my test inputs. There are some minor issues, but it is mostly correct. So, in about 30 minutes, o3 performed a job that would take me a day or so.” - Victor Taelin“Can confirm OpenAI Deep Research is quite strong. In a few minutes it did what used to take a dozen hours. The implications to knowledge work is going to be quite profound when you just ask an AI Agent to perform full tasks for you and come back with a finished result.” - Aaron Levie“Deep Research is genuinely useful” - Gary MarcusWith the advent of “Deep Research” agents, we are now routinely asking models to go through 100+ websites and generate in-depth reports on any topic. The Deep Research revolution has hit the AI scene in the last 2 weeks:* Dec 11th: Gemini Deep Research (today’s guest!) rolls out with Gemini Advanced* Feb 2nd: OpenAI releases Deep Research* Feb 3rd: a dozen “Open Deep Research” clones launch* Feb 5th: Gemini 2.0 Flash GA* Feb 15th: Perplexity launches Deep Research* Feb 17th: xAI launches Deep SearchIn today’s episode, we welcome Aarush Selvan and Mukund Sridhar, the lead PM and tech lead for Gemini Deep Research, the originators of the entire category. We asked detailed questions from inspiration to implementation, why they had to finetune a special model for it instead of using the standard Gemini model, how to run evals for them, and how to think about the distribution of use cases. (We also have an upcoming Gemini 2 episode with our returning first guest Logan Kilpatrick so stay tuned 👀)Two Kinds of Inference Time ComputeIn just ~2 months since NeurIPS, we’ve moved from “scaling has hit a wall, LLMs might be over” to “is this AGI already?” thanks to the releases of o1, o3, and DeepSeek R1 (see our o3 post and R1 distillation lightning pod). This new jump in capabilities is now accelerating many other applications; you might remember how “needle in a haystack” was one of the benchmarks people often referenced when looking at model’s capabilities over long context (see our 1M Llama context window ep for more). It seems that we have broken through the “wall” by scaling “inference time” in two meaningful ways — one with more time spent in the model, and the other with more tool calls.Both help build better agents which are clearly more intelligent. But as we discuss on the podcast, we are currently in a “honeymoon” period of agent products where taking more time (or tool calls, or search results) is considered good, because 1) quality is hard to evaluate and 2) we don’t know the realistic upper bound to quality. We know that they’re correlated, but we don’t know to what extent and if the correlation breaks down over extended research periods (they may not).It doesn’t take a PhD to spot the perverse incentives here.Agent UX: From Sync to Async to HybridWe also discussed the technical challenges in moving from a synchronous “chat” paradigm to the “async” world where every agent builder needs to handroll their own orchestration framework in the background.For now, most simple, first-cut implementations including Gemini and OpenAI and Bolt tend to make “locking” async experiences — while the report is generating or the plan is being executed, you can’t continue chatting with the model or editing the plan. In this case we think the OG Agent here is Devin (now GA), which has gotten it right from the beginning.Full Episode on YouTubewith demo!Show Notes* Deep Research* Aarush Selvan* Mukund Sridhar* NotebookLM episode (Raiza / Usama)* Bolt* Bret TaylorChapters* [00:00:00] Introductions* [00:00:22] Overview + Demo of Deep Research* [00:04:31] Editable chain of thought* [00:08:18] Search ranking for sources* [00:09:31] Can you DIY Deep Research?* [00:15:52] UX and research plan editing* [00:16:21] Follow-up queries and context retention* [00:21:06] Evaluating Deep Research* [00:28:06] Ontology of use cases and research patterns* [00:32:56] User perceptions of latency in Deep Research* [00:40:59] Lessons from other AI products* [00:42:12] Multimodal capabilities* [00:45:02] Technical challenges in Deep Research* [00:51:56] Can Deep Research discover new insights?* [00:54:11] Open challenges in agents* [00:57:04] Wrap upTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:13]: Hey, and today we're very honored to have in our studio Aarush and Mukund from the Deep Research team, the OG Deep Research team. Welcome.Aarush [00:00:20]: Thanks for having us.Swyx [00:00:22]: Yeah, thanks for making the trip up. I was fortunate enough to be one of the early beta testers of Deep Research when he came out. I would say I was very keen on, I think even at the end of last year, people were already saying it was one of the most exciting agents that was coming out of Google. You know that previously we had on Ryza and Usama from the Novoca LM team. And I think this is an increasing trend that Gemini and Google are shipping interesting user-facing products that use AI. So congrats on your success so far. Yeah, it's been great. Thanks so much for having us here. Yeah. Yeah, thanks for making the trip up. And I'm also excited for your talk that is happening next week. Obviously, we have to talk about what exactly it is, but I'll ask you towards the end. So basically, okay, you know, we have the screen up. Maybe we just start at a high level for people who don't yet know. Like, what is Deep Research? Sure.Aarush [00:01:10]: So Deep Research is a feature where Gemini can act as your personal research assistant to help you learn about any topic that you want more deeply. It's really helpful for those queries. So you want to go from zero to 50 really fast on a new thing. And the way it works is it takes your query, browses the web for about five minutes, and then outputs a research report for you to review and ask follow-up questions. This is one of the first times, you know, something takes about five, six minutes trying to perform your research. So there's a few challenges that brings. Like, you want to make sure you're spending that time in the computer doing what the user wants. So there's some ways of the UX design that we can talk about. As we go through an example, and then there's also challenges in the browsers, the web is super fragmented and being able to plan iteratively and as, as you pass through this noisy information is a challenge by itself.Swyx [00:02:11]: Yeah. This is like the first time sort of Google automating yourself as searching, like you're, you know, you're supposed to be the experts at search, but now you're like meta-searching and like determining the search strategy.Aarush [00:02:22]: Yeah, I think, at least we see it as two different use cases. There are things that, you know, you know exactly what you're looking for and there's a search is still probably, you know, a very, you know, probably one of the best places to go. I think when deep research really shines is there like multiple facets to your question and you spend like a weekend, you know, just opening like 50, 60 tabs and many times I just give up and we wanted to solve that problem and, and give a great starting point for those kinds of journeys.Alessio [00:02:53]: Do we want to start a query so that it runs in the meantime and then we can chat over it?Swyx [00:02:58]: Okay, here's one query that, that we like, we love to test like super niche, random things, like things where there's like no Wikipedia page already about this topic or something like that, right? Because that's where you'll see the most lift from, from a feature like this. So for this one, I've come, I've come, come up with this query. This is actually Mokun's query that he's, he loves to test is help me understand how milk and meat regulations differ between the US and Europe. What's nice is the first step is actually where it puts together a research plan. That you can review. And so this is sort of its guide for how it's going to go about and carry out the research, right? And so this was like a pretty decently well-specified query, but like, let's say you came to Gemini and we're like, tell me about batteries, right? That query, you could mean so many different things. You might want to know about the like latest innovations in battery tech. You might want to know about like a specific type of battery chemistry. And if we're going to spend like five to even 10 minutes researching something, we want to one, understand. What exactly are you trying to accomplish here and to give you an opportunity, like to steer where the research goes, right? Because like, if you had an intern and you ask them this question, the first thing they do is ask you like a bunch of follow-up questions and be like, okay, so like, help me figure out exactly what you want me to do. And so the way we approached it is, we thought like, why don't we just have the model produce its first stab at the, at the research query at, at how it would break this down. And then invite the user to come and kind of engage with how they would want to steer this. Yeah.Editable chain of thoughtAarush [00:04:31]: And many times when you try to use a product like this, you often don't know what questions to look for or the things to look for. So we kind of made this decision very deliberately that instead of asking the users just follow-up questions directly, we kind of lay out, hey, this is what I would do. Like, these are the different facets. For example, here it could be like what additives are allowed and how that differs or labeling. Uh, restrictions and so on in products. The aim of this is to kind of tell the user about the topic a little bit more and also get steer. At the same time, we elicit for like, uh, you know, a follow-up question and so on. So we kind of did that in a joint question.Swyx [00:05:09]: It's kind of like editable chain of thought. Right. Exactly. Exactly. Yeah. I think that, you know, we were talking to you about like your top tips for using deep research. Yeah. Your number one tip is to edit the page. Just edit it. Right. So like we actually, you can actually edit conversationally. We put in a button here just to like draw users' attention to the fact that you can edit. Oh, actually you don't need to click the button. You don't even need to click the button. Yeah. Actually, like in early rounds of testing, we saw no one was editing. And so we were just like, if we just put a button here, maybe people will like. I confess I just hit start a lot. I think like we see that too. Like most people hit start. Um, like it's like the, I'm feeling lucky. Yeah. Yeah. All right. So like I, I can just add a, add a step here and what you'll see is it should like refine the plan and show you a new thing to propose. Here we go. So it's added step seven, find information and milk and meat labeling requirements in the US and the EU, or you can just go ahead and hit start. I think it's still like a nice transparency mechanism. Even if users don't want to engage, like you still kind of know, okay, here's at least an understanding of why I'm getting the report I'm going to get, um, which is kind of nice. And then while it browses the web and Morgan, you should maybe explain kind of how it, how it browses. We show kind of the, the websites it's reading in real time. Yeah. I'll preface this with, I haven't, I forgot to explain the rules. You're a PM and you're a tech lead. Yes. Okay. Yeah.Aarush [00:06:29]: Just for people who, who don't know, we maybe should have started with that. I suppose. Yeah. Yeah. We do each other's work sometimes as well, but more or less that's the boundary. Yeah. Yeah. Um, yeah. So, so what's happening behind the scenes actually is we kind of give this research plan that is a contract and that, uh, you know, has been accepted, but then if you look at the plan, there are things that are obviously parallelizable, so the model figures out which of the sub steps that it can start exploring in parallel, and then it primarily uses like two tools. It has the ability to perform searches and it has abilities to go deeper within, you know, a particular webpage of interest, right? And oftentimes it'll start exploring things in parallel, but that's not sufficient. Many times it, it has to reason based on information found. So in this case, it, one of the searches could have led the EU commission has these additives, and it wants to go and check if the FDA does the same thing, right? So, uh, this notion of being able to read outputs from the previous turn, uh, ground on that to decide what to do next, I think was, was key. Otherwise you have like incomplete information and your report becomes a little bit of a, like a high level, uh, bullet points. So we wanted to go beyond that blueprint and actually figure out, you know, what are the key aspects here. So, yeah. So the, this happens iteratively until the model thinks it's finished. All its steps. And then we kind of entered this, uh, analysis mode and here there can be inconsistencies across sources. You kind of come up with an outline for the report, start generating a draft. The model tries to revise that by self critiquing itself, uh, you know, to find out to finalize the prompt, uh, finalize the report. And that's probably what's happening behind the scenes.Search ranking for sourcesAlessio [00:08:18]: What's the initial ranking of the websites? So when you first started it, there were 36. How do you decide where to start since it sounds like, you know, the initial websites kind of carry a lot of weight too, because then they inform the following. Yes.Aarush [00:08:32]: So what happens in the initial terms, again, this is not like a, it's not something we enforce. It's mostly the model making these choices. But typically we see the model exploring all the different aspects in the, in the research plan that was presented. So we kind of get like a breadth first idea of what are the different topics to explore. And in terms of which ones to double click. I think it really comes down to every time you search the model, get some idea of what the pages and then depending on what pieces of it, sometimes there's inconsistency. Sometimes there's just like partial information. Those are the ones that double clicks on and, uh, yeah, it can continually like iteratively search and browse until it feels like it's done. Yeah.Swyx [00:09:15]: I'm trying to think about how I would code this. Um, a simple question would be like, do you think that we could do this with the Gemini API? Or do you have some special access that we cannot replicate? You know, like is, if I model this with a so-called of like search, double click, whatever. Yeah.Aarush [00:09:31]: I don't think we have special access per se. It's pretty much the same model. We of course have our own, uh, post-training work that we do. And y'all can also like, you know, you can fine tune from the base model and so on. Uh, I don't know that we can do it.Swyx [00:09:45]: I don't know how to fine tuning.Aarush [00:09:47]: Well, if you use our Gemma open source models, uh, you could fine tune. Yeah. I don't think there's a special access per se, but a lot of the work for us is first defining these, oh, there needs to be a research plan and, and how do you go about presenting that? And then, uh, a bunch of post-training to make sure, you know, it's able to do this consistently well and, uh, with, with high reliability and power. Okay.Swyx [00:10:09]: So, so 1.5 pro with deep research is a special edition of 1.5 pro. Yes.Aarush [00:10:14]: Right.Swyx [00:10:14]: So it's not pure 1.5 pro. It's, it's, it's, it's a post-training version. This also explains why you haven't just, you can't just toggle on 2.0 flash and just, yeah. Right. Yeah. But I mean, I, I assume you have the data and you know, it's should be doable. Yup. There's still this like question of ranking. Yeah. Right. And like, oh, it looks like you're, you're already done. Yeah. Yeah. We're done. Okay. We can look at it. Yeah. So let's see. It's put together this report and what it's done is it's sort of broken, started with like milk regulation and then it looks like it goes into meat probably further down and then sort of covering how the U.S. approaches this problem of like how to regulate milk. Comparing and then, you know, covering the EU and then, yeah, like I said, like going into the meat production and then it'll also, what's nice is it kind of reasons over like why are there differences? And I think what's really cool here is like, it's, it's showing that there's like a difference in philosophy between how the U.S. and the EU regulate food. So the EU like adopts a precautionary approach. So even if there's inconclusive scientific evidence about something, it's still going to prefer to like ban it. Whereas the U.S. takes sort of the reactive approach where it's like allowing things until they can be proven to be harmful. Right. So like, this is kind of nice is that you, you also like get the second order insights from what it's being put, what it's putting together. So yeah, it's, it's kind of nice. It takes a few minutes to read and like understand everything, which makes for like a quiet period doing a podcast, I suppose. But yeah, this is, this is kind of how it, how it looks right now. Yeah.Alessio [00:11:47]: And then from here you can kind of keep the usual chat and iterate. So this is more, if you were to like, you know, compared to other platforms, it's kind of like a Anthropic Artifact or like a ChatGPT canvas where like you have the document on one side and like the chat on the other and you're working on it.Aarush: [00:12:04]: Yeah. This is something we thought a bit about. And one of the things we feel is like your learning journey shouldn't just stop after the first report. And so actually what you probably want to do is while reading, be able to ask follow-up questions without having to scroll back and forth. And there's like broadly. A few different kinds of follow-up questions. One type is like, maybe there's like a factoid that you want that isn't in here, but it's probably been already captured as part of the web browsing that it did. Right. So we actually keep everything in context, like all the sites that it's read remain in context. So if there's a piece of missing information, it can just fetch that. Then another kind is like, okay, this is nice, but you actually want to kick off more deep research. Or like, I also want to compare the EU and Asia. Let's say in how they regulate milk and meat for that. You'd actually want the model to be like, okay, this is sufficiently different that I want to go do more deep research to answer this question. I won't find this information in what I've already browsed. And the third is actually, maybe you just want to like change the report. Like maybe you want to like condense it, remove sections, add sections, and actually like iterate on the report that you got. So we broadly are basically trying to teach the model to be able to do all three and the kind of side-by-side format allows sort of for the user to do that more easily. Yeah.Alessio [00:13:24]: So as a PM, there's a open in docs button there, right? Yeah. How do you think about what you're supposed to build in here versus kind of sounds like the condensing and things should be a Google docs. Yeah.Aarush [00:13:35]: It's just like an amazing editor. Like sometimes you just want to direct edit things and now Google docs also has Gemini in the side panel. So the more we can kind of help this be part of your workflow throughout the rest of the Google ecosystem, the better, right? Like, and one thing that we've noticed is people really like that button and really like exporting it. It's also a nice way to just save it permanently. And when you do export all the citations, and in fact, I can just run it now, carry over, which is also really nice. Gemini extensions is a different feature. So that is really around Gemini being able to fetch content from other Google services in order to inform the answer. So that was actually the first feature that we both worked on on the team as well. It was actually building extensions in Gemini. And so I think right now we have a bunch of different Google apps as well as I think Spotify and a couple, I don't know if we have, and Samsung apps as well. Who wants Spotify? I have this whole thing about like who wants Spotify? Who wants that in their deep research? In deep research, I think less, but like the interesting thing is like we built extensions and we didn't, we weren't really sure how people were going to use it. And a ton of people are doing really creative things with them. And a ton of people are just doing things that they loved on the Google assistant. And Spotify is like a huge, like playing music on the go was like a huge, a huge value. Oh, it controls Spotify? Yeah. It's not deep research. For deep research. Yeah. Purely use. Yeah. But this is search. Otherwise, yeah. Like you can, you can have Gemini go. Yeah. You have YouTube maps and search for flash thinking experimental with apps. The newest. Yeah. Longest model name that has been launched. But like, yeah, I think Gmail is obvious one. Yeah. The calendar is obvious one. Exactly. Those I want. Yeah. Spotify. Yeah. Fair enough. Yeah. And obviously feel free to dive in on your other work. I know you're, you're not just doing deep research, right? But you know, we're just kind of focusing on, on deep research here. I actually have asked for modifications after this first run where I was like, oh, you, you stopped. Like, I actually want you to keep going. Like what about these other things? And then continue to modify it. So it really felt like a little bit of a co-pilot type experience, but more like an experience. Yeah, we're just that much more than an agent that would be research. I thought it was pretty cool.UX and research plan editingAarush [00:15:52]: Yeah. One of the challenges is currently we kind of let the model decide based on your query amongst the three categories. So some, there is, there is a boundary there. Like some of these things, depending on how deep you want to go, you might just want to quite g thermometer versus like kick off another deeper search. And even from a UX perspective, I think the, the panel allows for this notion of, you know, not every fall up is going to take you. Like five minutes. Right.Swyx [00:16:17]: Right now, it doesn't do any follow-up. Does it do follow-up search? It always does?Aarush [00:16:21]: It depends on your question. Since we have the liberty of really long context models, we actually hold all the research material across dance. So if it's able to find the answer in things that it's found, you're going to get a faster reply. Yeah. Otherwise, it's just going to go back to planning.Swyx [00:16:38]: Yeah, yeah. A bit of a follow-up on the, since you brought up context, I had two questions. One, do you have a HTML to markdown transform step? Or do you just consume raw HTML? There's no way you consume raw HTML, right?Aarush [00:16:50]: We have both versions, right? So there is, the models are getting, like every generation of models are getting much better at native understanding of these representations. I think the markdown step definitely helps in terms of, you know, there's a lot of noise, like as you can imagine with the pure HTML. JavaScript, WinCSS. Exactly. So yeah, when it makes sense to do it, we don't artificially try to make it hard for the model. But sometimes it depends on the kind of access of what we get as well. Like, for example, if there's an embedded snippet that's HTML, we want the model to, you know, to be able to work on that as well.Swyx [00:17:27]: And no vision yet, but. Currently no vision, yes. The reason I ask all these things is because I've done the same. Got it. Like I haven't done vision.Aarush [00:17:36]: Yeah. So the tricky thing about vision is I think the models are getting significantly better, especially if you look at the last six months, natively being able to do like VQA stuff, and so on. But the challenge is the trade-off between having to, you know, actually render it and so on. The gap, the trade-off between the added latency versus the value add you get.Swyx [00:17:57]: You have a latency budget of minutes. Yeah, yeah, yeah.Aarush [00:18:01]: It's true. In my opinion, the places you'll see a real difference is like, I don't know, a small part of the tail, especially in like this kind of an open domain setting. If you just look at what people ask, there's definitely some use cases where it makes a lot of sense. But I still feel it's not in the head cases. And we'll do it when we get there.Swyx [00:18:23]: The classic is like, it's a JPEG that has some important information and you can't touch it. Okay. And then the other technical follow-up was just, you have 1 million to 2 million token context. Has it ever exceeded 2 million? And what do you do there? Yeah.Aarush [00:18:39]: So we had this challenge sometime last year where we said, when we started like wiring up this multi-turn, where we said, hey, we're going to do this. Hey, let's see how long somebody in the team can take DR, you know? Yeah.Swyx [00:18:51]: What's the most challenging question you can ask that takes the longest? Yeah. No, we also keep asking follow-ups.Aarush [00:18:55]: Like for example, here you could say, hey, I also want to compare it with like how it's Okay.Swyx [00:19:00]: So you're guaranteed to bust it. Yeah.Aarush [00:19:02]: Yeah. We also have, we have retrieval mechanisms if required. So we natively try to use the context as much as it's available beyond which, you know, we have like a rack set up to figure. Okay.Alessio [00:19:16]: This is all in-house, in-house tech. Yes. Okay.Aarush [00:19:19]: Yes.Alessio [00:19:19]: What are some of the differences between putting things in context versus rag? And when I was in Singapore, I went to the Google cloud team and they talk about Gemini plus grounding is Gemini plus search kind of like Gemini plus grounding or like, how should people think about the different shades of like, I'm doing retrieval and data versus I'm using deep research versus I'm using grounding. Sometimes the labels can be different. Sometimes it can be hard too.Aarush [00:19:46]: Yeah. I can, let me try to answer the first part of the question. Uh, the, the second part, I'm not fully sure of, of the grounding offering. So, uh, uh, when I can at least, at least talk about the first part of the question. So I think, uh, you're asking like the difference between like being able to, when you, when would you do a rag versus rely on the long contact?Alessio [00:20:06]: I think we all, we all get that. I was more curious, like from a product perspective, when you decide to do a rag versus s**t like this, you didn't need to, you know? Yeah. Do you get better performance just putting everything in context or?Aarush [00:20:18]: So the tricky thing for rag, it really works well because a lot of these things are doing like cosine distance, like a dot product kind of a thing. And that kind of gets challenging when your query side has multiple different attributes. Uh, the dot product doesn't really work as well. I would say, at least for me, that's, that's my guiding principle on, uh, when to avoid rag. That's one. The second one is, I think every generation. Of these models are, uh, like the initial generations, even though they offered like long context, that performance as the context kept growing was, you would see some kind of a decline, but I think, uh, as the newer generation models came out, uh, they were really good. Even if you kept filling in the context in being able to piece out, uh, like these really fine-grained information.Evaluating Deep ResearchSwyx [00:21:06]: So I think these two, at least for me, are like guiding principles on when to. Just to add to that. I think like, just like a simple rule of thumb that we use. Is like, if it's the most recent set of research tasks where the user is likely to ask lots of follow-up questions that should be in context, but like as stuff gets 10 tasks ago, you know, it's fine. If that stuff is in rag, because it's less likely that the user needs to do, you need to do like very complex comparisons between what's currently being discussed and the stuff that you asked about, you know, 10 turns ago. Right. So that's just like a, a very, like the rule of thumb that we follow. Yeah.Alessio [00:21:44]: So from a user perspective, is it better to just start a new research instead of like extending the context? Yeah.Aarush [00:21:50]: I think that's a good question. I think if it's a related topic, I think there's benefit to continue with this thread, uh, because you could, the model, since it has this in memory could figure out, oh, I've found this niche thing, uh, about, uh, I don't know, milk regulation in this case in the U S let me check if you're in a follow-up country or place also has something like that. So these kinds of things you might have not caught up. But if you start a new thread. So I think it really depends on, on the use case, if there's a natural progression, uh, and you feel like this is like part of one cohesive kind of a project, you should just continue using it. My follow-up is going to be like, oh, I'm just going to look for summer camps or something then. Yeah. I don't think it should make a difference, but we haven't really, uh, you know, pushed that to, uh, and, and, and tested that, that aspect of it for us. Most of our tests are like more natural transitions. Yeah.Swyx [00:22:40]: How do you eval deep research? Oh boy.Aarush [00:22:43]: Uh, yeah. This is a hard one. I think the entropy of the output space is so high, like it's, uh, like people love auto raters, but it brings its own, own, own set of, uh, challenges. And so for us, we have some metrics that we can auto generate, right? So for example, as we move, uh, when we do post-training and have multiple, uh, models, we kind of want to make sure, uh, the distribution of like certain stats, like for example, how long is spent on planning? How many, how many iterative steps it does on like some dev set, if you see large changes in distribution, that's, that's kind of like a early, uh, signal of, of something has changed. It could be for better or worse. Uh, so we have some metrics like that, that we can auto compute. So every time you have a new version, you run it across a test suite of cases and you see how long it takes. Yeah. So we have like a dev set and we have like some kind of automatic metrics that we can detect in terms of like the behavior end to end. Like for example, how long is the research plan? Do we, do we have like a, do we have like a, do we have like a, do we have like a, do we have like a new model is like a new model, produce really longer, many more steps, number of characters, like number of steps in case of the plan in the plans, it could be like, like we spoke about how it iteratively plans based on like previous searches, how many steps does that go on an average or some dev set. So there are some things like this you can automate, but beyond that, there are all generators, but we definitely do a lot of human evals and that we have defined with product about certain things we care about. I've been super opinionated about, is it comprehensive, is it complete, like groundedness and these kind of things. So it's a mix of these two attributes. There's another challenge, but I'll...Swyx [00:24:26]: Is this where, the other challenge in that, sometimes you just have to have your PM review examples. Yeah, exactly.Aarush [00:24:34]: Yeah, and for latency... So you're the human reader. But broadly, what we tried to do is, for the eval question, is like, we tried to think about like, what are all the ways in which a person might use a feature like this? And we came up with what we call an ontology of use cases. Yes. And really what we tried to do is like, stay away from like verticals, like travel or shopping and things like that. But really try and go into like, what is the underlying research behavior type that a person is doing? So... Yeah. There's queries on one end that are just, you're going very broad, but shallow, right? Things like, shopping queries are an example of that, or like, I want to find the perfect summer camp, my kids love soccer and tennis. And really, you just want to find as many different options and explore all the different options that are available, and then synthesize, okay, what's the TLDR about each one? Kind of like those journeys where you open many, many Chrome tabs, but then like, need to take notes somewhere of the stuff that's appealing. On the other end of the spectrum... You know, you've got like, a specific topic, and you just want to go super deep on that and really, really understand that. And there's like, all sorts of points in the middle, right? Around like, okay, I have a few options, but I want to compare them, or like, yeah, I want to go not super deep on a topic, but I want to cover a slightly, slightly more topics. And so we sort of developed this ontology of different research patterns, and then for each one came up with queries that would fall within that, and then that's sort of the eval set, by way of saying, okay, what's the TLDR about each one? Which we then run human evals on, and make sure we're kind of doing well across the board on all of those. Yeah, you mentioned three things. Is it literally three, or is it three out of like, 20 things? How wide is the ontology? I basically just told the... The full set? Yeah, I told, no, no, no, I told you the like, extremes, right? Extremes, okay. Yeah, and then we had like, several midpoints. So basically, yeah, going from like, something super broad and shallow to something very specific and deep. We weren't actually sure which end of the spectrum users are going to really resonate with. And then on top of that, you have compounds of those, right? So you can have things where you want to make a plan, right? Like, a great one is like, I want to plan a wedding in, you know, Lisbon, and I, you know, I need you to help with like, these 10 things, right? And so... Oh, that becomes like a project with research enabled... Right. And so then it needs to research planners, and venues, and catering, right? And so there's sort of compounds of when you start combining these different underlying ontology types. And so that, we also thought about that when we... When we tried to put together our eval set.Swyx: What's the maximum conversation length that you allow or design for?Aarush: We don't have any hard limits on the... How many turns you can do. One thing I will say is most users don't go very deep right now. Yeah. It might just be that it takes a while to get comfortable. And then over time, you start pushing it further and further. But like, right now, we don't see a ton of users. I think the way that you visually present it suggests that you stop when the doc is created. Right. So you don't... You don't actually really encourage... The UI doesn't encourage ongoing chats as though it was like a project. Right. I think there's definitely some things we can do on the UX side to basically invite the user to be like, Hey, this is the starting point. Now let's keep going together. Like, where else would you like to explore? So I think there's definitely some explorations we could do there. I think the... In terms of sort of how deep... I don't know. We've seen people internally just really push this thing. Yeah. To quite...Ontology of use cases and research patternsAarush [00:28:06]: I think the other thing I think will change with time is people kind of uncovering different ways to use deep research as well. Like for the wedding planning thing, for example. It's not one of the, you know, first thing that comes to mind when we tell people about this product. So that's another thing I think as people explore and find that this can do these various different kinds of things. Some of this can naturally lead to longer conversations. And even for us, right? When we dogfooded this, we saw people use it in, like, ways we hadn't really thought of before. So that was because this was, like, a little new. Like, we didn't know, like, will users wait for five minutes? What kind of tasks will... Are they, you know, going to try for something like that takes five minutes? So our primary goal was not to specialize in a particular vertical or target one type of user. We just wanted to put this in the hands of, like... Like, we had, like... This busy parent persona and, like, various different user profiles and see, like, what people try to use it for and learn more from that.Alessio [00:29:11]: And how does the ontology of the DR use case tie back to, like, the Google main product use cases? So you mentioned shopping as one ontology, right? There's also Google Shopping. Yeah. To me, this sounds like a much better way to do shopping than going on Google Shopping and looking at the wall of items. How do you collaborate internally to figure out where AI goes?Swyx [00:29:32]: Yeah, that's a great question. So when I meant, like, shopping, I sort of tried to boil down underneath what exactly is the behavior. And that's really around, like, I called it, like, options exploration. Like, you just want to be able to see. And whether you're shopping for summer camps or shopping for a product or shopping for, like, scholarship opportunities, it's sort of the same action of just, like, I need to curate from a large... Like, I need to sift through a lot of information to curate a bunch of options for me. So that's kind of what we tried to distill down rather than, like, thinking about it. It was a vertical. But yeah, Google Search is, like, awesome if you want to have really fast answers. You've got high intent for, like, I know exactly what I want. And you want, like, super up-to-date information, right? And I still do kind of like Google Shop because it's, like, multimodal. You see the best prices and stuff like that. I think creating a good shopping experience is hard, especially, like, when you need to look at the thing. If I'm shopping for shoes and, like, I don't want to use deep research because I want... I don't want to look at how the shoes look. But if I'm shopping for, like, HVAC systems, great. Like, I don't care how it looks or I don't even know what it's supposed to look like. And I'm fine using deep research because I really want to understand the specs and, like, how exactly does this work and the voltage rating and stuff like that, right? So, like, and I need to also look at contractors who know how to install each HVAC system. So I would say, like, where we really shine when it comes to shopping is those... That kind of end of the spectrum of, like, it's more complex and it matters less what it... Like, it's maybe less on the consumery side of shopping. One thing I've also observed just about the, I guess, the metrics or, like, the communication of what value you provide. And also this goes into the latency budget, is that I think there's a perverse incentive for research agents to take longer and be perceived to be better. People are like, oh, you're searching, like, 70 websites for me, you know, but, like, 30 of them are irrelevant, you know? Like, I feel like right now we're in kind of a honeymoon phase where you get a pass for all this. But being inefficient is actually good for you because, you know, people just care about quantity and not quality, right? So they're like, oh, this thing took an hour for me, like, it's doing so much work, like, or it's slow. That was super counterintuitive for us. So actually, the first time I realized that, what you're saying is when I was talking to Jason Calacanis and he was like, do you actually just make the answer in 10 seconds and just make me wait for the balance? Yeah. Which we hadn't expected. That people would actually value the, like, work that it's putting in because... You were actually worried about it. We were really worried about it. We were like, I remember, we actually built two versions of deep research. We had, like, a hardcore mode that takes, like, 15 minutes. And then what we actually shipped is a thing that takes five minutes. And I even went to Eng and I was like, there has to be a hard stop, by the way. It can never take more than 10 minutes. Yep. Because I think at that point, like, users will just drop off. Nope. But what's been surprising is, like, that's not the case at all. And it's been going the other way. Because when we worked on Assistant, at least, and other Google products, the metric has always been, if you improve latency, like, all the other metrics go up. Like, satisfaction goes up, retention goes up, all of that, right? And so when we pitch this, it's like, hold on. In contrast to, like, all Google orthodoxy, we're actually going to slow everything right down. And we're going to hope that, like, users still stay... Not on purpose.User perceptions of latency in Deep ResearchAarush [00:32:56]: Not on purpose. Yeah, I think it comes down to the trade-off. Like, what are you getting in return? For the wait. And from an engineering-slash-modeling perspective, it's just trading off entrance, compute, and time to do two things, right? Either to explore more, to be, like, more complete, or to verify more on things that you probably know already. And since it's like a spectrum, and we don't claim to have found the perfect spot, we had to start somewhere. And we're trying to see where... Like, there's probably some cases where you actually care about verifying more. More than the others. In an ideal world, based on the query and conversation history, you know what that is. So I think, yeah, it basically boils down to these three things. From a user perspective, am I getting the right value add? From an engineering-slash-modeling perspective, are we using the compute to either explore effectively and also verify and go in-depth for things that are vague or uncertain in the initial steps? The other point about the more number of websites, I think, again, it comes down to the number of websites. Sometimes you want to explore more early on before you kind of narrow down on either the sources or the topics you want to go deep. So that's one of the... If you look at, like, the way, at least for most queries, the way deep research works here is initially it'll go broad. If you look at the kinds of websites, it's time to explore all the different topics that we measured in the research plan. And then you would see choices of websites getting a little bit narrower on a particular topic or a particular topic. So that's roughly how the number kind of fluctuates. So we don't do anything deliberate to either keep it low or, you know, try to...Swyx [00:34:44]: Would it be interesting to have an explicit toggle for amount of verification versus amount of search? I think so. I think, like, users would always just hit that toggle. I worry that, like... Max everything. Yeah, if you, like, give a max power button, users will always... You're just going to hit that button, right? So then the question comes, like, why don't you just decide from the product POV, where's the right balance? OpenAI has a preview of this, like... I think it's either Anthropic or OpenAI, and there's a preview of this model routing feature where you can choose intelligence, cheapness, and speed. But then they're all zero to one values. So then you just choose one for everything. Obviously, they're going to, like, do a normalization thing. But users are always going to want one, right?Aarush [00:35:30]: We've discussed this a bit. Like, if I wear my pure user hat, I don't want to say anything. Like, I come with a query, you figure it out. Like, sometimes I feel like there will be, based on the query... Like, for example, right? If I'm asking about, hey, how does rising rates from the Fed house old income for a middle class? And how has it traditionally happened? These kind of things, you want to be very accurate. And you want to be very precise on historical trends of this, and so on, and so on. Whereas there is... There's a little bit more leeway when you're saying, hey, I'm trying to find businesses near me to go celebrate my birthday or something like that. So in an ideal world, we kind of figure that trade-off based on the conversation history and the topic. I don't think we're there yet as a research community. And it's an interesting challenge by itself.Swyx [00:36:20]: So this reminds me a little bit of the notebook LM approach. Raiza, who also asked this thing to Raiza, and she was like, yeah, just people want to click a button and see magic. Yeah. Like you said, you just hit start every time, right? You don't, most people don't even want to add up the plan. So, okay. My feedback on this, if you want feedback, is that I am still kind of a champion for Devin. In a sense that Devin will show you the plan while it's working the plan. And you can say like, hey, the plan is wrong. And I can chat with it while it's still working. And you live update the plan and then pick off the next item on the plan. I think it's static, right? Like while you're working on a plan, I cannot chat. It's just normal. Bolt also has this, like, you know, that's the most default experience, but I think you should never lock the chat. You should always be able to chat with the plan and update the plan and the plan scheduler, whatever orchestration system you have under the hood should just pick off the next job on the list. That'll be my two cents. Especially if we spend more time researching, right? Cause like right now, if you watch that query we just did, it was done within a few minutes. So your chance, your opportunity to chime in was actually like, or it left the research phase after a few minutes. So your opportunity to chime in. To chime in and steer was less, but especially imagine you could imagine a world where these things take an hour, right? And you're doing something really complicated. Then yeah, like your intern would totally come check in with you. Be like, here's what I found. Here's like some hiccups I'm running into the plan. Give me some steer on how to change that or how to change direction. And you would, you would do that with them. So I totally would see, especially as these tasks get longer, we actually want the user to come engage way more to like create a good output. I guess Devin had to do this because some of these jobs like take hours. Right. So, yeah. And it's pervasive since it's where they charge by hour. Oh, so they make more money, the slower they are. Interesting. Have we thought about that before?Swyx [00:38:14]: I'm calling this out because everyone is like, oh my God, it takes hours for, it does hours of work autonomously for me. And then they are like, okay, it's good. But like, this is a honeymoon phase. Like at some point we're going to say like, okay, but you know, it's very slow.Swyx [00:38:29]: Yeah. Anything else? Anything else that like, I mean, obviously within Google, you have a lot of other initiatives, you, I'm sure you like sit close to the Nopal Galem team in any learnings that are coming from shipping AI products in general. They're really awesome people. Like they're really nice, friendly thought, just like as people, I'm sure you met them, you like realize this with Razer and stuff. So like, they've actually been really, really cool collaborators or just like people to bounce ideas off. I think one thing I found really inspiring is they just picked a problem and hindsight's 2020. But like in advance, just like, Hey, we just want to build like the perfect IDE for you to do work and like be able to upload documents and ask questions about it and just make that really, really good. And I think we were definitely really inspired by their ability, their vision of just like, let's pick up a simple problem, really go after it, do it really, really well and have be opinionated about how it should work and just hope that users also resonate with that. And that's definitely something that we tried to learn from separately. They've also been really good at, you know, and maybe more. If you want to chime in here, just extracting the most out of Gemini 1.5 Pro, and they were really friendly about just like sharing their ideas about how to do that.Aarush [00:39:38]: Yeah, I think, I think you, you, you learn a bit, like when you're trying to do the last, last mile off of these products and, and, and, and pitfalls of, of any, any given model and so on. So, yeah, we definitely have a healthy relationship and, and, and share notes and like you're doing the same for other, other products.Swyx [00:39:54]: You'll never merge, right? It's just different teams. They are different teams. So they're in like labs as an organization that. So the mission of that is to really explore kind of different bets and, and explore what's possible. Even though I think there's a paid plan for Nopal Galem now. Yeah. So I think, and it's the same plan as us actually. So it's like, it's more than just the labs is what I'm saying. It's more than just labs. Cause I mean, yeah, ideally you want things to graduate and into, and stick around, but hopefully one thing we've done is, uh, like not created different skews, but just being like, Hey, if you pay the AI premium school, yeah, whatever. You get, you get everything, everything.Alessio [00:40:30]: What about learning from others? Obviously, I mean, open AI is deep research literally as the same name. I'm sure. Yeah. I'm sure there's a lot of, you know, contention. Is there anything you've learned from other people trying to build similar tools? Like, do you have opinions on maybe what people are getting wrong that they should do differently? It seems like from the outside, a lot of these products look the same. Ask for a research, get back a research, but obviously when you're building them, you understand the nuances a lot more.Lessons from other AI productsAarush [00:40:59]: When we built deep research, I think there was a few things that we took a few different bets, uh, around how this, how it should work. And what's nice is some of that is actually where we feel like was the right way to go. So we felt like agents should be transparent around telling you upfront, especially if they're going to take some time, what they're going to do. So that's really where that research plan, we showed that in a card, we really wanted to be very publisher forward in this product. So while it's browsing, we wanted to show you like all the websites. It's reading in real time, make it super easy for you to like double-click into those while it's browsing. And the third thing is, you know, putting it into a side-by-side artifacts so that you could ideally easy for you to read and ask at the same time. And what's nice is you kind of, as other products come around, you see some of these ideas also appearing in, in other iterations of this product. So I definitely see this as a space where like everyone in the industry is learning from each other, good ideas get reproduced and built upon. And so, yeah, we'll, we'll definitely keep iterating. And, and kind of following our users and seeing, seeing how we can make, make our future better. But yeah, I think, I think like it's, it's like, this is the way the industry works is like, everyone's going to kind of see good ideas and want to replicate and build off of it.Alessio [00:42:12]: And on the model side, OpenAI is the O3 model, which is not available through the API, the full one. Have you tried already with the two model? Like, is it a big jump or is a lot of the work on the post-training?Aarush [00:42:25]: Yeah, I would say stay tuned. Definitely. It currently is running on, on 1.5, the, the new generation models, especially with these thinking models, they unlock a few things. So I think one is obviously the better capability in like analytical thinking, like in math, coding, and these type of things, but also this notion of, you know, as they produce thoughts and think before taking actions, they kind of inherently have this notion of being able to critique them, the partial steps that they take and so on. So yeah, we definitely expect that. And then there is a little bit of the, the interesting part, and the interesting thing with we're exploring multiple different options to make better value for the, for our users as we, as we treat.Swyx [00:43:03]: I feel like there's a little bit of a conflation of inference time compute here in a sense of like, one, you can infer算 compute with the model, the thinking model. And then two, you can infersin compute by searching and reasoning. I wonder if there that gets in the way, like when you presumably, you've tested thinking, plus deep research, if the thinking actually does a little bit of verification. And then there's a little bit of thinking, plus deep research. Maybe it saves you some time or it like tries to draw too much from its internal knowledge and then therefore searches less, you know, like does it step on each other?Aarush [00:43:36]: Yeah, no, I think that's a, that's a really nice call out. And this also goes back to the kind of use case. The reason I bring that up is there are certain things that I can tell you from model memory last year, the Fed did X number of updates and so on. But unless I sourced it, it's going to be hallucinated. Yeah, like one is the hallucination or even if I got it right, as a user, I'd be very wary of that number unless I'm able to like source the .gov website for it and so on. Right. So that's another challenge. Like, there are things that you might not optimally spend time verifying, even though the models like, like, this is a very common fact the model already knows and it's able to like reason over and balancing that out between trying to leverage the model memory versus being able to ground this in, is in, you know, some kind of a source is the challenging part. And I think as, as like you rightly called out with the thinking models, this is even more pronounced because the models know more, they're able to like draw second order insights more just by reasoning over.Swyx [00:44:44]: Technically, they don't know more, they just use their internal knowledge more. Right?Aarush [00:44:48]: Yes, but also like, for example, things like math.Swyx [00:44:52]: I see, they've been, they've been post trained to do better math.Aarush [00:44:55]: Yeah, I think they just, they probably do way better job and in, like in, in that, so in that sense, they.Technical challenges in Deep ResearchSwyx [00:45:02]: Yeah, I mean, obviously reasoning is a topic of huge interest and people want to know what a engineering best practice is. Like, we think we know, like, you know, how to prompt them better, but engineering with them, I think also very, very unknown. Again, you guys are going to be the first to figure it out.Aarush [00:45:19]: Yeah, definitely interesting times and yeah. No pressure, Mokka. If you have tips, let us know.Swyx [00:45:25]: While we're on the sort of technical, elements and technical bend, I'm interested in like other parts of the deep research tech stack that might be worth calling out. Any hard problems that you solved just more generally?Aarush [00:45:37]: Yeah, I think the iterative planning one to do it in a generalizable way. Yeah, that was the thing I was most wary about. Like, you don't want to go down the route of being able to teach how to plan iteratively per domain or like per type of problem. Like, like even in the outgoing back to the ontology, if, if you had to teach them all. For every single type of ontology, how to come up with these traces of planning, that would have been a nightmarish. So trying to do that in a super data efficient way by, you know, leveraging a lot of like things, model memory, as well as like, there's this very tricky balance when you work on like, on the product side of any of these models is knowing how to post in it just enough without losing things that it knows in pre training, basically not overfitting in the most trivial sense, I guess. But yeah, so the techniques, their data augmentations there and multiple experiments to tune this trade off. I think that's, that's one of the challenges. Yeah.Swyx [00:46:37]: On the orchestration side, this is basically you're spinning up a job. I'm an orchestration nerd. So how do you do that?Aarush [00:46:43]: Is like a sub internal tool? Yeah, so we built this asynchronous platform for deep research, which is basically to like most of our interactions before this were like sync in nature. Like, yeah. Yeah.Swyx [00:46:56]: All the chat things are sync, right? Exactly. And now, now you can leave the chat and come back. Exactly.Aarush [00:47:01]: And close your computer. And now it's on Android and rolling out on iOS.Mukund [00:47:06]: So I saw you say that.Swyx [00:47:10]: I told you we switch it on sometimes. Okay.Mukund [00:47:13]: Like you're reminding him, right?Swyx [00:47:14]: Yeah, we wrapped on all Android phones and then iOS is this week. But yeah, what's, what's neat though, is like, you can close your computer, get a notification on your phone. Right. And so on. So it's some kind of e-sync engine that you made.Aarush [00:47:29]: Yes, yes. So we, the other one is this notion of synchronicity and the user able to leave. But also if you're, if you build like five, six minute jobs, they're bound to be like failures and you don't want to like lose your progress and so on. So this notion of like keeping state, knowing what to retry and kind of keep the journey going. Is there a public name for this or just some internal thing?Swyx [00:47:52]: No, I don't think there's a public name for this.Aarush [00:47:54]: Yeah.Swyx [00:47:54]: All right. Data scientists would be like, this is a Spark job or, you know, it's like a Wraith, you know, thing or whatever in the old Google days might be like MapReduce or, you know, whatever, but like it's, it's a different scale and nature of work than those things. So we just, I'm trying to find a name for this. And right now, this is our opportunity to name it. We can name it now. The classic name is I used to work in this area. This is what I'm asking. So it's, it's workflows. Nice. Yeah. Sort of durable workflows.Aarush [00:48:24]: Like back when you were in AWS. Temporal.Swyx [00:48:26]: So Apache Airflow, Temporal. You guys were both at Amazon, by the way. Yeah. AWS Step Functions would be one of those where you define a graph of execution, but Step Functions are more static and would not be as able to accommodate deep research style backends. What's neat though, is we built this to be like quite flexible. So it's like, you can imagine once you start doing hour or multi-day jobs. Yeah. You have to model what the agent wants to do. Exactly. And, but also like ensure like it's stable, you know, for, for me. Like hundreds of LLM calls. Yeah. It's boring, but like, you know, this is the thing that makes it run autonomously, you know? Right. Yeah. So like it's, yeah. Anyway, I'm excited about it. Just to close up the opening eye thing. I would say opening eye easily beat you on marketing. And I think it's because you don't launch your benchmarks. And my question to you is, should you care about benchmarks? Should you care about humanities last exam or not MMLU, but whatever. The like, I think benchmarks are great. Yeah. The thing we wanted to avoid is like the day Kobe Bryant entered the league, who was the president's nephew and like weird, like He's a big Kobe fan. Okay. Just like these like weird things that like nobody talks that way. So like, why would we over-solve for like some sort of a benchmark that doesn't necessarily represent the product experience we want to build. Nevertheless, like benchmarks are great for the industry and like rally a community and help us like understand where we're at. I don't know. Do you have any?Aarush [00:49:51]: No, I think you kind of hit the points. I think the, for us, our primary goal is like solving the deep research user value for the user use case. The benchmarks, at least the ones that we are seeing, they don't directly translate to the product. There's definitely some technical challenges that you can benchmark against, but they don't really like if I do great on HLE, that doesn't really mean I'm a great deep researcher. So we want to avoid that. We want to avoid going into that rabbit hole a bit. But we also feel like, yeah, benchmarks are great, especially in the whole gen AI space with like models coming every other day and everybody claiming to be like soda. So it's tricky. The other big challenge with benchmarks, especially when it comes to like the models these days, is the output space entropy is like everything is like a text. And so there's a notion of verifying even if you got the right answer, different labs do it in like different ways. And, but we all come back to it. We all compare numbers. So there's a lot of, you know, art slash figuring out like how you verify this or how you run this in a level plane. But yeah, so I think the straight offs is definitely value to doing benchmarks.Swyx [00:51:05]: But at the same time, we also like a selfish PM perspective. Benchmarks are a really great way to motivate researchers. Like make number go up. Exactly. Or just like prove you're the best. Like it's like a really good way of like rallying the researchers within your company. Like I used to work on the MLPerf benchmarks and like that was like, yeah, you'd put like a bunch of engineers in a room and in a few days they do like amazing performance improvements on our TPU stack and things like that. Right. So just like having a competitive nature and a pressure like really motivates people. There's one benchmark that is impossible to benchmark, but I just want to leave you with it, which is that deep research. Most people are chasing this idea of discovering new ideas. And deep research right now will summarize the web in a way that. Yeah. Is much more readable, but it won't. You know, what will it take to discover new things from the things that you've searched?Can Deep Research discover new insights?Aarush [00:51:56]: First, I think the thinking style models definitely help here because they are significantly better on how they reason natively and being able to draw these second order insights, which is like very premise. Like if you can't do that, you can't think of doing what you mentioned. So that's that's one step in. The other thing is. I think it also depends on the domain. So sometimes you can drift with a model for like new hypothesis, but depending on the domain, you might not be able to verify that hypothesis. Right. So like coding math, there are reasonably good tools that the model already knows to interact with. And you can run a verifier, test the hypothesis and so on. Like even if you think about it from a purely agent perspective saying, hey, I have this hypothesis in this area. Go figure out and come back to me. Right. But let's say you're a chemist. Right. So what are you going to do that? We don't have like synthetic environments yet where the model is able to verify these hypotheses by playing in a playground and have this like a very accurate verifier or a reward signal. The computer uses another one where there are both in the open source research and so on. There's like nice playgrounds coming up. So I think for if you're talking about truly being able to come up with my personal opinion is the model doesn't have to do the second order thinking. And so on that we're seeing now with these new models, but also be able to play and test that out in an environment where you can verify and give it feedback so that it can continue trading. Yeah.Swyx [00:53:28]: So basically like code sandboxes for now.Aarush [00:53:32]: Yeah. Yeah. So in those kind of cases, I think, yeah, it's a little bit more easy to envision this like end to end, but not for all domains. Physics engines. Yeah.Alessio [00:53:42]: So if you think about agents more broadly, there's like a lot of things. Right. That go into it. What do you think are like the most valuable pieces that people should be spending time on? Like things that come to mind that I'm seeing a lot of early stage companies is like memory, you know, like we already touched on evals. We touched a little bit on a tool call. There's kind of like the odd piece, like should this agent be able to access this? If yes, how do you verify that? What are things that you want more people to work on that will be helpful to you?Open challenges in agentsMukund [00:54:11]: I can take a stab at this from the lens of like deep research. Right. Like I think some of the things that we're really interested in in how we can push this agent are one like similar to memories, like personalization. Right. Like if I'm giving you a research report, the way I would give it to you if you're a 15 year old in high school should be totally different to the way I give it to you if you're like a PhD or postdoc. Right. You can prompt it. You can prompt it. Right. But the second thing, though, is like it should like ideally know where you're at and like everything, you know, up to that point. Right. And kind of further customized. Right. Have this understanding of like where you are in your learning journeys. I think modality will be also really interesting. Like right now we're text in, text out. We should go multimodal in. Right. But also multimodal out. Right. Like I would love if my reports are not just text, but like charts, maps, images, like make it super interactive and multimodal. Right. And optimized for the type of consumption. Right. So the way in which I might put together an academic paper should be totally different to the way I'm trying to do like a learning program for a kid. Right. And just the way it's structured. Ideally, like you want to do things with generative UI and things like that to really customize reports. I think those are definitely things that I'm personally interested when it comes to like a research agent. I think the other part that's super important is just like we will reach the limits of the open web and you want to be able to like a lot of the things that people care about are things that are in their own documents. Their own corpuses, things that are within subscriptions that they personally really care about. Like especially as you go more niche into specific industries. And ideally, you want ways for people to be able to complement their deep research experience with that content in order to further customize their answers.Aarush [00:55:56]: There's two answers to this. So one is I feel in terms of like the approach for us, at least for me, rather trying to figure out the core mission for like an agent building that. I feel like it's still early days for us. Like to try to platformatize or like try to build these. Oh, there are these five horizontal pieces and you can plug and play and build your own agent. My personal opinion is we are not there yet. In order to build a super engaging agent, I would if I were to start thinking of a new idea, I would I would start from the idea and try to just just do that one thing really well. Yes, at some point there will be a time where like these common pieces can be pulled out. And then. Yeah. And, you know, platformatized. I know there's a lot of work across companies and in the open source community about providing these tools to really build agents very easily. I think those are super useful to start building agents. But at some point, once those tools enable you to build the basic layers, I think me as an individual would would, you know, try to focus on really curating one experience before going super broad. Yeah.Alessio [00:57:04]: We have Bret Taylor from Sierra and he said they mostly built everything.Swyx [00:57:08]: Which is very sad for VCs.Aarush [00:57:10]: I want to find the next great framework and tooling and all that. But the space is moving so fast. Like, like the problem I described might be obsolete six months from now. And I don't know. Like, we'll fix it with one more LLM ops platform.Mukund [00:57:25]: Yes. Yes.Swyx [00:57:26]: Okay. So just just a final final point on just plugging your talk. People will be hearing this before your talk. What are you going to talk about? What are you looking forward to in New York? I would love to, like, actually learn from you guys. Like, what would you like us to do? Talk about now that we've had this conversation with you? Yeah. Yeah. What would what do you think people would find most interesting? I think a little bit of implementation and a little bit of vision, like kind of 50 50. And I think both of you can can sort of fill those roles very well. Everyone, you know, looks at you. You're very polished Google products. And I think Google always does does polish very well. But everyone will have to want to want like deep research for their industry. He's invested in deep research for finance. Yeah. And they focus on their their thing. And there will be deep researches for everything. Right. Like you have created a category here that OpenAI has cloned. And so, like, OK, let's let's talk about, like, what are the hard problems in this brand of agent that is probably the first real product market fit agent? I would say more so than the computer use ones. This is the one where, like, yeah, people are like easily pays for $200 worth a month worth of stuff, probably 2000 once you get it really good. So I'm like, OK, let's talk about like how to do this right from the people who did it. And then where is this going? So, yeah. Yeah. Yeah. It's very simple.Aarush [00:58:37]: Happy to talk about that.Swyx [00:58:39]: Yeah. Thank Yeah. For me as well. You know, I'm also curious to see you interact with the other speakers because then, you know, there will be other sort of agent problems. And I'm very interested in personalization. Very interested in memory. I think those are related problems. Planning, orchestration, all those things. Often security, something that we haven't talked about. There's a lot of the web that's behind off walls. Can I how do I delegate to you my credentials so that you can go and search the things that I have access to? I don't think it's that hard. You know, it's just, you know, people have to get their protocols together. And that's what conferences like that is hopefully meant to achieve. Yeah.Aarush: No, I'm super excited. I think for us, like it's we often like live and breathe within Google and which is like a really big place. But it's really nice to like take a step back. Meet people like approaching this problem at other companies or totally different industries. Right. Like inevitably, at least where we work, we're very consumer focused space. I see. Right. Yeah.Swyx: I'm more B2B. It's also really great to understand, like, OK, what's going on within the B2B space and like within different verticals. Yeah. The first thing they want to do is do research for my own docs. Right. My company docs. Yeah. So, yeah, obviously, you're going to get asked for that. Yeah. I mean, there'll be there'll be more to discuss. I'm really looking forward to your talk. And yeah. Thanks for joining us. Get full access to Latent.Space at www.latent.space/subscribe
    --------  
    1:01:58
  • Bee AI: The Wearable Ambient Agent
    Bundle tickets for AIE Summit NYC have now sold out. You can now sign up for the livestream — where we will be making a big announcement soon. NYC-based readers and Summit attendees should check out the meetups happening around the Summit.2024 was a very challenging year for AI Hardware. After the buzz of CES last January, 2024 was marked by the meteoric rise and even harder fall of AI Wearables companies like Rabbit and Humane, with an assist from a pre-wallpaper-app MKBHD. Even Friend.com, the first to launch in the AI pendant category, and which spurred Rewind AI to rebrand to Limitless and follow in their footsteps, ended up delaying their wearable ship date and launching an experimental website chatbot version. We have been cautiously excited about this category, keeping tabs on most of the top entrants, including Omi and Compass. However, to date the biggest winner still standing from the AI Wearable wars is Bee AI, founded by today's guests Maria and Ethan. Bee is an always on hardware device with beamforming microphones, 7 day battery life and a mute button, that can be worn as a wristwatch or a clip-on pin, backed by an incredible transcription, diarization and very long context memory processing pipeline that helps you to remember your day, your todos, and even perform actions by operating a virtual cloud phone. This is one of the most advanced, production ready, personal AI agents we've ever seen, so we were excited to be their first podcast appearance. We met Bee when we ran the world's first Personal AI meetup in April last year.As a user of Bee (and not an investor! just a friend!) it’s genuinely been a joy to use, and we were glad to take advantage of the opportunity to ask hard questions about the privacy and legal/ethical side of things as much as the AI and Hardware engineering side of Bee. We hope you enjoy the episode and tune in next Friday for Bee’s first conference talk: Building Perfect Memory.Full YouTube Video VersionWatch this for the live demo!Show Notes* Bee Website* Ethan Sutin, Maria de Lourdes Zollo* Bee @ Personal AI Meetup* Buy Bee with Listener Discount Code!Timestamps* 00:00:00 Introductions and overview of Bee Computer* 00:01:58 Personal context and use cases for Bee* 00:03:02 Origin story of Bee and the founders' background* 00:06:56 Evolution from app to hardware device* 00:09:54 Short-term value proposition for users* 00:12:17 Demo of Bee's functionality* 00:17:54 Hardware form factor considerations* 00:22:22 Privacy concerns and legal considerations* 00:30:57 User adoption and reactions to wearing Bee* 00:35:56 CES experience and hardware manufacturing challenges* 00:41:40 Software pipeline and inference costs* 00:53:38 Technical challenges in real-time processing* 00:57:46 Memory and personal context modeling* 01:02:45 Social aspects and agent-to-agent interactions* 01:04:34 Location sharing and personal data exchange* 01:05:11 Personality analysis capabilities* 01:06:29 Hiring and future of always-on AITranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of SmallAI.swyx [00:00:12]: Hey, and today we are very honored to have in the studio Maria and Ethan from Bee.Maria [00:00:16]: Hi, thank you for having us.swyx [00:00:20]: And you are, I think, the first hardware founders we've had on the podcast. I've been looking to have had a hardware founder, like a wearable hardware, like a wearable hardware founder for a while. I think we're going to have two or three of them this year. And you're the ones that I wear every day. So thank you for making Bee. Thank you for all the feedback and the usage. Yeah, you know, I've been a big fan. You are the speaker gift for the Engineering World's Fair. And let's start from the beginning. What is Bee Computer?Ethan [00:00:52]: Bee Computer is a personal AI system. So you can think of it as AI living alongside you in first person. So it can kind of capture your in real life. So with that understanding can help you in significant ways. You know, the obvious one is memory, but that's that's really just the base kind of use case. So recalling and reflective. I know, Swyx, that you you like the idea of journaling, but you don't but still have some some kind of reflective summary of what you experienced in real life. But it's also about just having like the whole context of a human being and understanding, you know, giving the machine the ability to understand, like, what's going on in your life. Your attitudes, your desires, specifics about your preferences, so that not only can it help you with recall, but then anything that you need it to do, it already knows, like, if you think about like somebody who you've worked with or lived with for a long time, they just know kind of without having to ask you what you would want, it's clear that like, that is the future that personal AI, like, it's just going to be very, you know, the AI is just so much more valuable with personal context.Maria [00:01:58]: I will say that one of the things that we are really passionate is really understanding this. Personal context, because we'll make the AI more useful. Think about like a best friend that know you so well. That's one of the things that we are seeing from the user. They're using from a companion standpoint or professional use cases. There are many ways to use B, but companionship and professional are the ones that we are seeing now more.swyx [00:02:22]: Yeah. It feels so dry to talk about use cases. Yeah. Yeah.Maria [00:02:26]: It's like really like investor question. Like, what kind of use case?Ethan [00:02:28]: We're just like, we've been so broken and trained. But I mean, on the base case, it's just like, don't you want your AI to know everything you've said and like everywhere you've been, like, wouldn't you want that?Maria [00:02:40]: Yeah. And don't stay there and repeat every time, like, oh, this is what I like. You already know that. And you do things for me based on that. That's I think is really cool.swyx [00:02:50]: Great. Do you want to jump into a demo? Do you have any other questions?Alessio [00:02:54]: I want to maybe just cover the origin story. Just how did you two meet? What was the was this the first idea you started working on? Was there something else before?Maria [00:03:02]: I can start. So Ethan and I, we know each other from six years now. He had a company called Squad. And before that was called Olabot and was a personal AI. Yeah, I should. So maybe you should start this one. But yeah, that's how I know Ethan. Like he was pivoting from personal AI to Squad. And there was a co-watching with friends product. I had experience working with TikTok and video content. So I had the pivoting and we launched Squad and was really successful. And at the end. The founders decided to sell that to Twitter, now X. So both of us, we joined X. We launched Twitter Spaces. We launched many other products. And yeah, till then, we basically continue to work together to the start of B.Ethan [00:03:46]: The interesting thing is like this isn't the first attempt at personal AI. In 2016, when I started my first company, it started out as a personal AI company. This is before Transformers, no BERT even like just RNNs. You couldn't really do any convincing dialogue at all. I met Esther, who was my previous co-founder. We both really interested in the idea of like having a machine kind of model or understand a dynamic human. We wanted to make personal AI. This was like more geared towards because we had obviously much limited tools, more geared towards like younger people. So I don't know if you remember in 2016, there was like a brief chatbot boom. It was way premature, but it was when Zuckerberg went up on F8 and yeah, M and like. Yeah. The messenger platform, people like, oh, bots are going to replace apps. It was like for about six months. And then everybody realized, man, these things are terrible and like they're not replacing apps. But it was at that time that we got excited and we're like, we tried to make this like, oh, teach the AI about you. So it was just an app that you kind of chatted with and it would ask you questions and then like give you some feedback.Maria [00:04:53]: But Hugging Face first version was launched at the same time. Yeah, we started it.Ethan [00:04:56]: We started out the same office as Hugging Face because Betaworks was our investor. So they had to think. They had a thing called Bot Camp. Betaworks is like a really cool VC because they invest in out there things. They're like way ahead of everybody else. And like back then it was they had something called Bot Camp. They took six companies and it was us and Hugging Face. And then I think the other four, I'm pretty sure, are dead. But and Hugging Face was the one that really got, you know, I mean, 30% success rate is pretty good. Yeah. But yeah, when we it was, it was like it was just the two founders. Yeah, they were kind of like an AI company in the beginning. It was a chat app for teenagers. A lot of people don't know that Hugging Face was like, hey, friend, how was school? Let's trade selfies. But then, you know, they built the Transformers library, I believe, to help them make their chat app better. And then they open sourced and it was like it blew up. And like they're like, oh, maybe this is the opportunity. And now they're Hugging Face. But anyway, like we were obsessed with it at that time. But then it was clear that there's some people who really love chatting and like answering questions. But it's like a lot of work, like just to kind of manually.Maria [00:06:00]: Yeah.Ethan [00:06:01]: Teach like all these things about you to an AI.Maria [00:06:04]: Yeah, there were some people that were super passionate, for example, teenagers. They really like, for example, to speak about themselves a lot. So they will reply to a lot of questions and speak about them. But most of the people, they don't really want to spend time.Ethan [00:06:18]: And, you know, it's hard to like really bring the value with it. We had like sentence similarity and stuff and could try and do, but it was like it was premature with the technology at the time. And so we pivoted. We went to YC and the long story, but like we pivoted to consumer video and that kind of went really viral and got a lot of usage quickly. And then we ended up selling it to Twitter, worked there and left before Elon, not related to Elon, but left Twitter.swyx [00:06:46]: And then I should mention this is the famous time when well, when when Elon was just came in, this was like Esther was the famous product manager who slept there.Ethan [00:06:56]: My co-founder, my former co-founder, she sleeping bag. She was the sleep where you were. Yeah, yeah, she stayed. We had left by that point.swyx [00:07:03]: She very stayed, she's famous for staying.Ethan [00:07:06]: Yeah, but later, later left or got, I think, laid off, laid off. Yeah, I think the whole product team got laid off. She was a product manager, director. But yeah, like we left before that. And then we're like, oh, my God, things are different now. You know, I think this is we really started working on again right before ChatGPT came out. But we had an app version and we kind of were trying different things around it. And then, you know, ultimately, it was clear that, like, there were some limitations we can go on, like a good question to ask any wearable company is like, why isn't this an app? Yes. Yeah. Because like.Maria [00:07:40]: Because we tried the app at the beginning.Ethan [00:07:43]: Yeah. Like the idea that it could be more of a and B comes from ambient. So like if it was more kind of just around you all the time and less about you having to go open the app and do the effort to, like, enter in data that led us down the path of hardware. Yeah. Because the sensors on this are microphones. So it's capturing and understanding audio. We started actually our first hardware with a vision component, too. And we can talk about why we're not doing that right now. But if you wanted to, like, have a continuous understanding of audio with your phone, it would monopolize your microphone. It would get interrupted by calls and you'd have to remember to turn it on. And like that little bit of friction is actually like a substantial barrier to, like, get your phone. It's like the experience of it just being with you all the time and like living alongside you. And so I think that that's like the key reason it's not an app. And in fact, we do have Apple Watch support. So anybody who has a watch, Apple Watch can use it right away without buying any hardware. Because we worked really hard to make a version for the watch that can run in the background, not super drain your battery. But even with the watch, there's still friction because you have to remember to turn it on and it still gets interrupted if somebody calls you. And you have to remember to. We send a notification, but you still have to go back and turn it on because it's just the way watchOS works.Maria [00:09:04]: One of the things that we are seeing from our Apple Watch users, like I love the Apple Watch integration. One of the things that we are seeing is that people, they start using it from Apple Watch and after a couple of days they buy the B because they just like to wear it.Ethan [00:09:17]: Yeah, we're seeing.Maria [00:09:18]: That's something that like they're learning and it's really cool. Yeah.Ethan [00:09:21]: I mean, I think like fundamentally we like to think that like a personal AI is like the mission. And it's more about like the understanding. Connecting the dots, making use of the data to provide some value. And the hardware is like the ears of the AI. It's not like integrating like the incoming sensor data. And that's really what we focus on. And like the hardware is, you know, if we can do it well and have a great experience on the Apple Watch like that, that's just great. I mean, but there's just some platform restrictions that like existing hardware makes it hard to provide that experience. Yeah.Alessio [00:09:54]: What do people do in like two or three days that then convinces them to buy it? They buy the product. This feels like a product where like after you use it for a while, you have enough data to start to get a lot of insights. But it sounds like maybe there's also like a short term.Maria [00:10:07]: From the Apple Watch users, I believe that because every time that you receive a call after, they need to go back to B and open it again. Or for example, every day they need to charge Apple Watch and reminds them to open the app every day. They feel like, okay, maybe this is too much work. I just want to wear the B and just keep it open and that's it. And I don't need to think about it.Ethan [00:10:27]: I think they see the kind of potential of it just from the watch. Because even if you wear it a day, like we send a summary notification at the end of the day about like just key things that happened to you in your day. And like I didn't even think like I'm not like a journaling type person or like because like, oh, I just live the day. Why do I need to like think about it? But like it's actually pretty sometimes I'm surprised how interesting it is to me just to kind of be like, oh, yeah, that and how it kind of fits together. And I think that's like just something people get immediately with the watch. But they're like, oh, I'd like an easier watch. I'd like a better way to do this.swyx [00:10:58]: It's surprising because I only know about the hardware. But I use the watch as like a backup for when I don't have the hardware. I feel like because now you're beamforming and all that, this is significantly better. Yeah, that's the other thing.Ethan [00:11:11]: We have way more control over like the Apple Watch. You're limited in like you can't set the gain. You can't change the sample rate. There's just very limited framework support for doing anything with audio. Whereas if you control it. Then you can kind of optimize it for your use case. The Apple Watch isn't meant to be kind of recording this. And we can talk when we get to the part about audio, why it's so hard. This is like audio on the hardest level because you don't know it has to work in all environments or you try and make it work as best as it can. Like this environment is very great. We're in a studio. But, you know, afterwards at dinner in a restaurant, it's totally different audio environment. And there's a lot of challenges with that. And having really good source audio helps. But then there's a lot more. But with the machine learning that still is, you know, has to be done to try and account because like you can tune something for one environment or another. But it'll make one good and one bad. And like making something that's flexible enough is really challenging.Alessio [00:12:10]: Do we want to do a demo just to set the stage? And then we kind of talk about.Maria [00:12:14]: Yeah, I think we can go like a walkthrough and the prod.Alessio [00:12:17]: Yeah, sure.swyx [00:12:17]: So I think we said I should. So for listeners, we'll be switching to video. That was superimposed on. And to this video, if you want to see it, go to our YouTube, like and subscribe as always. Yeah.Maria [00:12:31]: And by the bee. Yes.swyx [00:12:33]: And by the bee. While you wait. While you wait. Exactly. It doesn't take long.Maria [00:12:39]: Maybe you should have a discount code just for the listeners. Sure.swyx [00:12:43]: If you want to offer it, I'll take it. All right. Yeah. Well, discount code Swyx. Oh s**t. Okay. Yeah. There you go.Ethan [00:12:49]: An important thing to mention also is that the hardware is meant to work with the phone. And like, I think, you know, if you, if you look at rabbit or, or humane, they're trying to create like a new hardware platform. We think that the phone's just so dominant and it will be until we have the next generation, which is not going to be for five, you know, maybe some Orion type glasses that are cheap enough and like light enough. Like that's going to take a long time before with the phone rather than trying to just like replace it. So in the app, we have a summary of your days, but at the top, it's kind of what's going on now. And that's updating your phone. It's updating continuously. So right now it's saying, I'm discussing, you know, the development of, you know, personal AI, and that's just kind of the ongoing conversation. And then we give you a readable form. That's like little kind of segments of what's the important parts of the conversations. We do speaker identification, which is really important because you don't want your personal AI thinking you said something and attributing it to you when it was just somebody else in the conversation. So you can also teach it other people's voices. So like if some, you know, somebody close to you, so it can start to understand your relationships a little better. And then we do conversation end pointing, which is kind of like a task that didn't even exist before, like, cause nobody needed to do this. But like if you had somebody's whole day, how do you like break it into logical pieces? And so we use like not just voice activity, but other signals to try and split up because conversations are a little fuzzy. They can like lead into one, can start to the next. So also like the semantic content of it. When a conversation ends, we run it through larger models to try and get a better, you know, sense of the actual, what was said and then summarize it, provide key points. What was the general atmosphere and tone of the conversation and potential action items that might've come of that. But then at the end of the day, we give you like a summary of all your day and where you were and just kind of like a step-by-step walkthrough of what happened and what were the key points. That's kind of just like the base capture layer. So like if you just want to get a kind of glimpse or recall or reflect that's there. But really the key is like all of this is now like being influenced on to generate personal context about you. So we generate key items known to be true about you and that you can, you know, there's a human in the loop aspect is like you can, you have visibility. Right. Into that. And you can, you know, I have a lot of facts about technology because that's basically what I talk about all the time. Right. But I do have some hobbies that show up and then like, how do you put use to this context? So I kind of like measure my day now and just like, what is my token output of the day? You know, like, like as a human, how much information do I produce? And it's kind of measured in tokens and it turns out it's like around 200,000 or so a day. But so in the recall case, we have, um. A chat interface, but the key here is on the recall of it. Like, you know, how do you, you know, I probably have 50 million tokens of personal context and like how to make sense of that, make it useful. So I can ask simple, like, uh, recall questions, like details about the trip I was on to Taiwan, where recently we're with our manufacturer and, um, in real time, like it will, you know, it has various capabilities such as searching through your, your memories, but then also being able to search the web or look at my calendar, we have integrations with Gmail and calendars. So like connecting the dots between the in real life and the digital life. And, you know, I just asked it about my Taiwan trip and it kind of gives me the, the breakdown of the details, what happened, the issues we had around, you know, certain manufacturing problems and it, and it goes back and references the conversation so I can, I can go back to the source. Yeah.Maria [00:16:46]: Not just the conversation as well, the integrations. So we have as well Gmail and Google calendar. So if there is something there that was useful to have more context, we can see that.Ethan [00:16:56]: So like, and it can, I never use the word agentic cause it's, it's cringe, but like it can search through, you know, if I, if I'm brainstorming about something that spans across, like search through my conversation, search the email, look at the calendar and then depending on what's needed. Then synthesize, you know, something with all that context.Maria [00:17:18]: I love that you did the Spotify wrapped. That was pretty cool. Yeah.Ethan [00:17:22]: Like one thing I did was just like make a Spotify wrap for my 2024, like of my life. You can do that. Yeah, you can.Maria [00:17:28]: Wait. Yeah. I like those crazy.Ethan [00:17:31]: Make a Spotify wrapped for my life in 2024. Yeah. So it's like surprisingly good. Um, it like kind of like game metrics. So it was like you visited three countries, you shipped, you know, XMini, beta. Devices.Maria [00:17:46]: And that's kind of more personal insights and reflection points. Yeah.swyx [00:17:51]: That's fascinating. So that's the demo.Ethan [00:17:54]: Well, we have, we can show something that's in beta. I don't know if we want to do it. I don't know.Maria [00:17:58]: We want to show something. Do it.Ethan [00:18:00]: And then we can kind of fit. Yeah.Maria [00:18:01]: Yeah.Ethan [00:18:02]: So like the, the, the, the vision is also like, not just about like AI being with you in like just passively understanding you through living your experience, but also then like it proactively suggesting things to you. Yeah. Like at the appropriate time. So like not just pool, but, but kind of, it can step in and suggest things to you. So, you know, one integration we have that, uh, is in beta is with WhatsApp. Maria is asking for a recommendation for an Italian restaurant. Would you like me to look up some highly rated Italian restaurants nearby and send her a suggestion?Maria [00:18:34]: So what I did, I just sent to Ethan a message through WhatsApp in his own personal phone. Yeah.Ethan [00:18:41]: So, so basically. B is like watching all my incoming notifications. And if it meets two criteria, like, is it important enough for me to raise a suggestion to the user? And then is there something I could potentially help with? So this is where the actions come into place. So because Maria is my co-founder and because it was like a restaurant recommendation, something that it could probably help with, it proposed that to me. And then I can, through either the chat and we have another kind of push to talk walkie talkie style button. It's actually a multi-purpose button to like toggle it on or off, but also if you push to hold, you can talk. So I can say, yes, uh, find one and send it to her on WhatsApp is, uh, an Android cloud phone. So it's, uh, going to be able to, you know, that has access to all my accounts. So we're going to abstract this away and the execution environment is not really important, but like we can go into technically why Android is actually a pretty good one right now. But, you know, it's searching for Italian restaurants, you know, and we don't have to watch this. I could be, you know, have my ear AirPods in and in my pocket, you know, it's going to go to WhatsApp, going to find Maria's thread, send her the response and then, and then let us know. Oh my God.Alessio [00:19:56]: But what's the, I mean, an Italian restaurant. Yeah. What did it choose? What did it choose? It's easy to say. Real Italian is hard to play. Exactly.Ethan [00:20:04]: It's easy to say. So I doubt it. I don't know.swyx [00:20:06]: For the record, since you have the Italians, uh, best Italian restaurant in SF.Maria [00:20:09]: Oh my God. I still don't have one. What? No.Ethan [00:20:14]: I don't know. Successfully found and shared.Alessio [00:20:16]: Let's see. Let's see what the AI says. Bottega. Bottega? I think it's Bottega.Maria [00:20:21]: Have you been to Bottega? How is it?Alessio [00:20:24]: It's fine.Maria [00:20:25]: I've been to one called like Norcina, I think it was good.Alessio [00:20:29]: Bottega is on Valencia Street. It's fine. The pizza is not good.Maria [00:20:32]: It's not good.Alessio [00:20:33]: Some of the pastas are good.Maria [00:20:34]: You know, the people I'm sorry to interrupt. Sorry. But there is like this Delfina. Yeah. That here everybody's like, oh, Pizzeria Delfina is amazing. I'm overrated. This is not. I don't know. That's great. That's great.swyx [00:20:46]: The North Beach Cafe. That place you took us with Michele last time. Vega. Oh.Alessio [00:20:52]: The guy at Vega, Giuseppe, he's Italian. Which one is that? It's in Bernal Heights. Ugh. He's nice. He's not nice. I don't know that one. What's the name of the place? Vega. Vega. Vega. Cool. We got the name. Vega. But it's not Vega.Maria [00:21:02]: It's Italian. Whatswyx [00:21:10]: Vega. Vega.swyx [00:21:16]: Vega. Vega. Vega. Vega. Vega. Vega. Vega. Vega. Vega.Ethan [00:21:29]: Vega. Vega. Vega. Vega. Vega.Ethan [00:21:40]: We're going to see a lot of innovation around hardware and stuff, but I think the real core is being able to do something useful with the personal context. You always had the ability to capture everything, right? We've always had recorders, camcorders, body cameras, stuff like that. But what's different now is we can actually make sense and find the important parts in all of that context.swyx [00:22:04]: Yeah. So, and then one last thing, I'm just doing this for you, is you also have an API, which I think I'm the first developer against. Because I had to build my own. We need to hire a developer advocate. Or just hire AI engineers. The point is that you should be able to program your own assistant. And I tried OMI, the former friend, the knockoff friend, and then real friend doesn't have an API. And then Limitless also doesn't have an API. So I think it's very important to own your data. To be able to reprocess your audio, maybe. Although, by default, you do not store audio. And then also just to do any corrections. There's no way that my needs can be fully met by you. So I think the API is very important.Ethan [00:22:47]: Yeah. And I mean, I've always been a consumer of APIs in all my products.swyx [00:22:53]: We are API enjoyers in this house.Ethan [00:22:55]: Yeah. It's very frustrating when you have to go build a scraper. But yeah, it's for sure. Yeah.swyx [00:23:03]: So this whole combination of you have my location, my calendar, my inbox. It really is, for me, the sort of personal API.Alessio [00:23:10]: And is the API just to write into it or to have it take action on external systems?Ethan [00:23:16]: Yeah, we're expanding it. It's right now read-only. In the future, very soon, when the actions are more generally available, it'll be fully supported in the API.Alessio [00:23:27]: Nice. I'll buy one after the episode.Ethan [00:23:30]: The API thing, to me, is the most interesting. Yeah. We do have real-time APIs, so you can even connect a socket and connect it to whatever you want it to take actions with. Yeah. It's too smart for me.Alessio [00:23:43]: Yeah. I think when I look at these apps, and I mean, there's so many of these products, we launch, it's great that I can go on this app and do things. But most of my work and personal life is managed somewhere else. Yeah. So being able to plug into it. Integrate that. It's nice. I have a bunch of more, maybe, human questions. Sure. I think maybe people might have. One, is it good to have instant replay for any argument that you have? I can imagine arguing with my wife about something. And, you know, there's these commercials now where it's basically like two people arguing, and they're like, they can throw a flag, like in football, and have an instant replay of the conversation. I feel like this is similar, where it's almost like people cannot really argue anymore or, like, lie to each other. Because in a world in which everybody adopts this, I don't know if you thought about it. And also, like, how the lies. You know, all of us tell lies, right? How do you distinguish between when I'm, there's going to be sometimes things that contradict each other, because I might say something publicly, and I might think something, really, that I tell someone else. How do you handle that when you think about building a product like this?Maria [00:24:48]: I would say that I like the fact that B is an objective point of view. So I don't care too much about the lies, but I care more about the fact that can help me to understand what happened. Mm-hmm. And the emotions in a really objective way, like, really, like, critical and objective way. And if you think about humans, they have so many emotions. And sometimes something that happened to me, like, I don't know, I would feel, like, really upset about it or really angry or really emotional. But the AI doesn't have those emotions. It can read the conversation, understand what happened, and be objective. And I think the level of support is the one that I really like more. Instead of, like, oh, did this guy tell me a lie? I feel like that's not exactly, like, what I feel. I find it curious for me in terms of opportunity.Alessio [00:25:35]: Is the B going to interject in real time? Say I'm arguing with somebody. The B is like, hey, look, no, you're wrong. What? That person actually said.Ethan [00:25:43]: The proactivity is something we're very interested in. Maybe not for, like, specifically for, like, selling arguments, but more for, like, and I think that a lot of the challenge here is, you know, you need really good reasoning to kind of pull that off. Because you don't want it just constantly interjecting, because that would be super annoying. And you don't want it to miss things that it should be interjecting. So, like, it would be kind of a hard task even for a human to be, like, just come in at the right times when it's appropriate. Like, it would take the, you know, with the personal context, it's going to be a lot better. Because, like, if somebody knows about you, but even still, it requires really good reasoning to, like, not be too much or too little and just right.Maria [00:26:20]: And the second part about, well, like, some things, you know, you say something to somebody else, but after I change my mind, I send something. Like, it's every time I have, like, different type of conversation. And I'm like, oh, I want to know more about you. And I'm like, oh, I want to know more about you. I think that's something that I found really fascinating. One of the things that we are learning is that, indeed, humans, they evolve over time. So, for us, one of the challenges is actually understand, like, is this a real fact? Right. And so far, what we do is we give, you know, to the, we have the human in the loop that can say, like, yes, this is true, this is not. Or they can edit their own fact. For sure, in the future, we want to have all of that automatized inside of the product.Ethan [00:26:57]: But, I mean, I think your question kind of hits on, and I know that we'll talk about privacy, but also just, like, if you have some memory and you want to confirm it with somebody else, that's one thing. But it's for sure going to be true that in the future, like, not even that far into the future, that it's just going to be kind of normalized. And we're kind of in a transitional period now. And I think it's, like, one of the key things that is for us to kind of navigate that and make sure we're, like, thinking of all the consequences. And how to, you know, make the right choices in the way that everything's designed. And so, like, it's more beneficial than it could be harmful. But it's just too valuable for your AI to understand you. And so if it's, like, MetaRay bands or the Google Astra, I think it's just people are going to be more used to it. So people's behaviors and expectations will change. Whether that's, like, you know, something that is going to happen now or in five years, it's probably in that range. And so, like, I think we... We kind of adapt to new technologies all the time. Like, when the Ring cameras came out, that was kind of quite controversial. It's like... But now it's kind of... People just understand that a lot of people have cameras on their doors. And so I think that...Maria [00:28:09]: Yeah, we're in a transitional period for sure.swyx [00:28:12]: I will press on the privacy thing because that is the number one thing that everyone talks about. Obviously, I think in Silicon Valley, people are a little bit more tech-forward, experimental, whatever. But you want to go mainstream. You want to sell to consumers. And we have to worry about this stuff. Baseline question. The hardest version of this is law. There are one-party consent states where this is perfectly legal. Then there are two-party consent states where they're not. What have you come around to this on?Ethan [00:28:38]: Yeah, so the EU is a totally different regulatory environment. But in the U.S., it's basically on a state-by-state level. Like, in Nevada, it's single-party. In California, it's two-party. But it's kind of untested. You know, it's different laws, whether it's a phone call, whether it's in person. In a state like California, it's two-party. Like, anytime you're in public, there's no consent comes into play because the expectation of privacy is that you're in public. But we process the audio and nothing is persisted. And then it's summarized with the speaker identification focusing on the user. Now, it's kind of untested on a legal, and I'm not a lawyer, but does that constitute the same as, like, a recording? So, you know, it's kind of a gray area and untested in law right now. I think that the bigger question is, you know, because, like, if you had your Ray-Ban on and were recording, then you have a video of something that happened. And that's different than kind of having, like, an AI give you a summary that's focused on you that's not really capturing anybody's voice. You know, I think the bigger question is, regardless of the legal status, like, what is the ethical kind of situation with that? Because even in Nevada that we're—or many other U.S. states where you can record. Everything. And you don't have to have consent. Is it still, like, the right thing to do? The way we think about it is, is that, you know, we take a lot of precautions to kind of not capture personal information of people around. Both through the speaker identification, through the pipeline, and then the prompts, and the way we store the information to be kind of really focused on the user. Now, we know that's not going to, like, satisfy a lot of people. But I think if you do try it and wear it again. It's very hard for me to see anything, like, if somebody was wearing a bee around me that I would ever object that it captured about me as, like, a third party to it. And like I said, like, we're in this transitional period where the expectation will just be more normalized. That it's, like, an AI. It's not capturing, you know, a full audio recording of what you said. And it's—everything is fully geared towards helping the person kind of understand their state and providing valuable information to them. Not about, like, logging details about people they encounter.Alessio [00:30:57]: You know, I've had the same question also with the Zoom meeting transcribers thing. I think there's kind of, like, the personal impact that there's a Firefly's AI recorder. Yeah. I just know that it's being recorded. It's not like a—I don't know if I'm going to say anything different. But, like, intrinsically, you kind of feel—because it's not pervasive. And I'm curious, especially, like, in your investor meetings. Do people feel differently? Like, have you had people ask you to, like, turn it off? Like, in a business meeting, to not record? I'm curious if you've run into any of these behaviors.Maria [00:31:29]: You know what's funny? On my end, I wear it all the time. I take my coffee, a blue bottle with it. Or I work with it. Like, obviously, I work on it. So, I wear it all the time. And so far, I don't think anybody asked me to turn it off. I'm not sure if because they were really friendly with me that they know that I'm working on it. But nobody really cared.swyx [00:31:48]: It's because you live in SF.Maria [00:31:49]: Actually, I've been in Italy as well. Uh-huh. And in Italy, it's a super privacy concern. Like, Europe is a super privacy concern. And again, they're nothing. Like, it's—I don't know. Yeah. That, for me, was interesting.Ethan [00:32:01]: I think—yeah, nobody's ever asked me to turn it off, even after giving them full demos and disclosing. I think that some people have said, well, my—you know, in a personal relationship, my partner initially was, like, kind of uncomfortable about it. We heard that from a few users. And that was, like, more in just, like— It's not like a personal relationship situation. And the other big one is people are like, I do like it, but I cannot wear this at work. I guess. Yeah. Yeah. Because, like, I think I will get in trouble based on policies or, like, you know, if you're wearing it inside a research lab or something where you're working on things that are kind of sensitive that, like—you know, so we're adding certain features like geofencing, just, like, at this location. It's just never active.swyx [00:32:50]: I mean, I've often actually explained to it the other way, where maybe you only want it at work, so you never take it from work. And it's just a work device, just like your Zoom meeting recorder is a work device.Ethan [00:33:09]: Yeah, professionals have been a big early adopter segment. And you say in San Francisco, but we have out there our daily shipment of over 100. If you go look at the addresses, Texas, I think, is our biggest state, and Florida, just the biggest states. A lot of professionals who talk for, and we didn't go out to build it for that use case, but I think there is a lot of demand for white-collar people who talk for a living. And I think we're just starting to talk with them. I think they just want to be able to improve their performance around, understand what they were doing.Alessio [00:33:47]: How do you think about Gong.io? Some of these, for example, sales training thing, where you put on a sales call and then it coaches you. They're more verticalized versus having more horizontal platform.Ethan [00:33:58]: I am not super familiar with those things, because like I said, it was kind of a surprise to us. But I think that those are interesting. I've seen there's a bunch of them now, right? Yeah. It kind of makes sense. I'm terrible at sales, so I could probably use one. But it's not my job, fundamentally. But yeah, I think maybe it's, you know, we heard also people with restaurants, if they're able to understand, if they're doing well.Maria [00:34:26]: Yeah, but in general, I think a lot of people, they like to have the double check of, did I do this well? Or can you suggest me how I can do better? We had a user that was saying to us that he used for interviews. Yeah, he used job interviews. So he used B and after asked to the B, oh, actually, how do you think my interview went? What I should do better? And I like that. And like, oh, that's actually like a personal coach in a way.Alessio [00:34:50]: Yeah. But I guess the question is like, do you want to build all of those use cases? Or do you see B as more like a platform where somebody is going to build like, you know, the sales coach that connects to B so that you're kind of the data feed into it?Ethan [00:35:02]: I don't think this is like a data feed, more like an understanding kind of engine and like definitely. In the future, having third parties to the API and building out for all the different use cases is something that we want to do. But the like initial case we're trying to do is like build that layer for all that to work. And, you know, we're not trying to build all those verticals because no startup could do that well. But I think that it's really been quite fascinating to see, like, you know, I've done consumer for a long time. Consumer is very hard to predict, like, what's going to be. It's going to be like the thing that's the killer feature. And so, I mean, we really believe that it's the future, but we don't know like what exactly like process it will take to really gain mass adoption.swyx [00:35:50]: The killer consumer feature is whatever Nikita Beer does. Yeah. Social app for teens.Ethan [00:35:56]: Yeah, well, I like Nikita, but, you know, he's good at building bootstrap companies and getting them very viral. And then selling them and then they shut down.swyx [00:36:05]: Okay, so you just came back from CES.Maria [00:36:07]: Yeah, crazy. Yeah, tell us. It was my first time in Vegas and first time CES, both of them were overwhelming.swyx [00:36:15]: First of all, did you feel like you had to do it because you're in consumer hardware?Maria [00:36:19]: Then we decided to be there and to have a lot of partners and media meetings, but we didn't have our own booth. So we decided to just keep that. But we decided to be there and have a presence there, even just us and speak with people. It's very hard to stand out. Yeah, I think, you know, it depends what type of booth you have. I think if you can prepare like a really cool booth.Ethan [00:36:41]: Have you been to CES?Maria [00:36:42]: I think it can be pretty cool.Ethan [00:36:43]: It's massive. It's huge. It's like 80,000, 90,000 people across the Venetian and the convention center. And it's, to me, I always wanted to go just like...Maria [00:36:53]: Yeah, you were the one who was like...swyx [00:36:55]: I thought it was your idea.Ethan [00:36:57]: I always wanted to go just as a, like, just as a fan of...Maria [00:37:01]: Yeah, you wanted to go anyways.Ethan [00:37:02]: Because like, growing up, I think CES like kind of peaked for a while and it was like, oh, I want to go. That's where all the cool, like... gadgets, everything. Yeah, now it's like SmartBitch and like, you know, vacuuming the picks up socks. Exactly.Maria [00:37:13]: There are a lot of cool vacuums. Oh, they love it.swyx [00:37:15]: They love the Roombas, the pick up socks.Maria [00:37:16]: And pet tech. Yeah, yeah. And dog stuff.swyx [00:37:20]: Yeah, there's a lot of like robot stuff. New TVs, new cars that never ship. Yeah. Yeah. I'm thinking like last year, this time last year was when Rabbit and Humane launched at CES and Rabbit kind of won CES. And now this year, no wearables except for you guys.Ethan [00:37:32]: It's funny because it's obviously it's AI everything. Yeah. Like every single product. Yeah.Maria [00:37:37]: Toothbrush with AI, vacuums with AI. Yeah. Yeah.Ethan [00:37:41]: We like hair blow, literally a hairdryer with AI. We saw.Maria [00:37:45]: Yeah, that was cool.Ethan [00:37:46]: But I think that like, yeah, we didn't, another kind of difference like around our, like we didn't want to do like a big overhypey promised kind of Rabbit launch. Because I mean, they did, hats off to them, like on the presentation and everything, obviously. But like, you know, we want to let the product kind of speak for itself and like get it out there. And I think we were really happy. We got some very good interest from media and some of the partners there. So like it was, I think it was definitely worth going. I would say like if you're in hardware, it's just kind of how you make use of it. Like I think to do it like a big Rabbit style or to have a huge show on there, like you need to plan that six months in advance. And it's very expensive. But like if you, you know, go there, there's everybody's there. All the media is there. There's a lot of some pre-show events that it's just great to talk to people. And the industry also, all the manufacturers, suppliers are there. So we learned about some really cool stuff that we might like. We met with somebody. They have like thermal energy capture. And it's like, oh, could you maybe not need to charge it? Because they have like a thermal that can capture your body heat. And what? Yeah, they're here. They're actually here. And in Palo Alto, they have like a Fitbit thing that you don't have to charge.swyx [00:39:01]: Like on paper, that's the power you can get from that. What's the power draw for this thing?Ethan [00:39:05]: It's more than you could get from the body heat, it turns out. But it's quite small. I don't want to disclose technically. But I think that solar is still, they also have one where it's like this thing could be like the face of it. It's just a solar cell. And like that is more realistic. Or kinetic. Kinetic, apparently, I'm not an expert in this, but they seem to think it wouldn't be enough. Kinetic is quite small, I guess, on the capture.swyx [00:39:33]: Well, I mean, watch. Watchmakers have been powering with kinetic for a long time. Yeah. We don't have to talk about that. I just want to get a sense of CES. Would you do it again? I definitely would not. Okay. You're just a fan of CES. Business point of view doesn't make sense. I happen to be in the conference business, right? So I'm kind of just curious. Yeah.Maria [00:39:49]: So I would say as we did, so without the booth and really like straightforward conversations that were already planned. Three days. That's okay. I think it was okay. Okay. But if you need to invest for a booth that is not. Okay. A good one. Which is how much? I think.Ethan [00:40:06]: 10 by 10 is 5,000. But on top of that, you need to. And then they go like 10 by 10 is like super small. Yeah. And like some companies have, I think would probably be more in like the six figure range to get. And I mean, I think that, yeah, it's very noisy. We heard this, that it's very, very noisy. Like obviously if you're, everything is being launched there and like everything from cars to cell phones are being launched. Yeah. So it's hard to stand out. But like, I think going in with a plan of who you want to talk to, I feel like.Maria [00:40:36]: That was worth it.Ethan [00:40:37]: Worth it. We had a lot of really positive media coverage from it and we got the word out and like, so I think we accomplished what we wanted to do.swyx [00:40:46]: I mean, there's some world in which my conference is kind of the CES of whatever AI becomes. Yeah. I think that.Maria [00:40:52]: Don't do it in Vegas. Don't do it in Vegas. Yeah. Don't do it in Vegas. That's the only thing. I didn't really like Vegas. That's great. Amazing. Those are my favorite ones.Alessio [00:41:02]: You can not fit 90,000 people in SF. That's really duh.Ethan [00:41:05]: You need to do like multiple locations so you can do Moscone and then have one in.swyx [00:41:09]: I mean, that's what Salesforce conferences. Well, GDC is how many? That might be 50,000, right? Okay. Form factor, right? Like my way to introduce this idea was that I was at the launch in Solaris. What was the old name of it? Newton. Newton. Of Tab when Avi first launched it. He was like, I thought through everything. Every form factor, pendant is the thing. And then we got the pendants for this original. The first one was just pendants and I took it off and I forgot to put it back on. So you went through pendants, pin, bracelet now, and maybe there's sort of earphones in the future, but what was your iterations?Maria [00:41:49]: So we had, I believe now three or four iterations. And one of the things that we learned is indeed that people don't like the pendant. In particular, woman, you don't want to have like anything here on the chest because it's maybe you have like other necklace or any other stuff.Ethan [00:42:03]: You just ship a premium one that's gold. Yeah. We're talking some fashion reached out to us.Maria [00:42:11]: Some big fashion. There is something there.swyx [00:42:13]: This is where it helps to have an Italian on the team.Maria [00:42:15]: There is like some big Italian luxury. I can't say anything. So yeah, bracelet actually came from the community because they were like, oh, I don't want to wear anything like as necklace or as a pendant. Like it's. And also like the one that we had, I don't know if you remember, like it was like circle, like it was like this and was like really bulky. Like people didn't like it. And also, I mean, I actually, I don't dislike, like we were running fast when we did that. Like our, our thing was like, we wanted to ship them as soon as possible. So we're not overthinking the form factor or the material. We were just want to be out. But after the community organically, basically all of them were like, well, why you don't just don't do the bracelet? Like he's way better. I will just wear it. And that's it. So that's how we ended up with the bracelet, but it's still modular. So I still want to play around the father is modular and you can, you know, take it off and wear it as a clip or in the future, maybe we will bring back the pendant. But I like the fact that there is some personalization and right now we have two colors, yellow and black. Soon we will have other ones. So yeah, we can play a lot around that.Ethan [00:43:25]: I think the form factor. Like the goal is for it to be not super invasive. Right. And something that's easy. So I think in the future, smaller, thinner, not like apple type obsession with thinness, but it does matter like the, the size and weight. And we would love to have more context because that will help, but to make it work, I think it really needs to have good power consumption, good battery life. And, you know, like with the humane swapping the batteries, I have one, I mean, I'm, I'm, I think we've made, and there's like pretty incredible, some of the engineering they did, but like, it wasn't kind of geared towards solving the problem. It was just, it's too heavy. The swappable batteries is too much to man, like the heat, the thermals is like too much to light interface thing. Yeah. Like that. That's cool. It's cool. It's cool. But it's like, if, if you have your handout here, you want to use your phone, like it's not really solving a problem. Cause you know how to use your phone. It's got a brilliant display. You have to kind of learn how to gesture this low range. Yeah. It's like a resolution laser, but the laser is cool that the fact they got it working in that thing, even though if it did overheat, but like too heavy, too cumbersome, too complicated with the multiple batteries. So something that's power efficient, kind of thin, both in the physical sense and also in the edge compute kind of way so that it can be as unobtrusive as possible. Yeah.Maria [00:44:47]: Users really like, like, I like when they say yes, I like to wear it and forget about it because I don't need to charge it every single day. On the other version, I believe we had like 35 hours or something, which was okay. But people, they just prefer the seven days battery life and-swyx [00:45:03]: Oh, this is seven days? Yeah. Oh, I've been charging every three days.Maria [00:45:07]: Oh, no, you can like keep it like, yeah, it's like almost seven days.swyx [00:45:11]: The other thing that occurs to me, maybe there's an Apple watch strap so that I don't have to double watch. Yeah.Maria [00:45:17]: That's the other one that, yeah, I thought about it. I saw as well the ones that like, you can like put it like back on the phone. Like, you know- Plog. There is a lot.swyx [00:45:27]: So yeah, there's a competitor called Plog. Yeah. It's not really a competitor. They only transcribe, right? Yeah, they only transcribe. But they're very good at it. Yeah.Ethan [00:45:33]: No, they're great. Their hardware is really good too.swyx [00:45:36]: And they just launched the pin too. Yeah.Ethan [00:45:38]: I think that the MagSafe kind of form factor has a lot of advantages, but some disadvantages. You can definitely put a very huge battery on that, you know? And so like the battery life's not, the power consumption's not so much of a concern, but you know, downside the phone's like in your pocket. And so I think that, you know, form factors will continue to evolve, but, and you know, more sensors, less obtrusive and-Maria [00:46:02]: Yeah. We have a new version.Ethan [00:46:04]: Easier to use.Maria [00:46:05]: Okay.swyx [00:46:05]: Looking forward to that. Yeah. I mean, we'll, whenever we launch this, we'll try to show whatever, but I'm sure you're going to keep iterating. Last thing on hardware, and then we'll go on to the software side, because I think that's where you guys are also really, really strong. Vision. You wanted to talk about why no vision? Yeah.Ethan [00:46:20]: I think it comes down to like when you're, when you're a startup, especially in hardware, you're just, you work within the constraints, right? And so like vision is super useful and super interesting. And what we actually started with, there's two issues with vision that make it like not the place we decided to start. One is power consumption. So you know, you kind of have to trade off your power budget, like capturing even at a low frame rate and transmitting the radio is actually the thing that takes up the majority of the power. So. Yeah. So you would really have to have quite a, like unacceptably, like large and heavy battery to do it continuously all day. We have, I think, novel kind of alternative ways that might allow us to do that. And we have some prototypes. The other issue is form factor. So like even with like a wide field of view, if you're wearing something on your chest, it's going, you know, obviously the wrist is not really that much of an option. And if you're wearing it on your chest, it's, it's often gone. You're going to probably be not capturing like the field of view of what's interesting to you. So that leaves you kind of with your head and face. And then anything that goes on, on the face has to look cool. Like I don't know if you remember the spectacles, it was kind of like the first, yeah, but they kind of, they didn't, they were not very successful. And I think one of the reasons is they were, they're so weird looking. Yeah. The camera was so big on the side. And if you look at them at array bands where they're way more successful, they, they look almost indistinguishable from array bands. And they invested a lot into that and they, they have a partnership with Qualcomm to develop custom Silicon. They have a stake in Luxottica now. So like they coming from all the angles, like to make glasses, I think like, you know, I don't know if you know, Brilliant Labs, they're cool company, they make frames, which is kind of like a cool hackable glasses and, and, and like, they're really good, like on hardware, they're really good. But even if you look at the frames, which I would say is like the most advanced kind of startup. Yeah. Yeah. Yeah. There was one that launched at CES, but it's not shipping yet. Like one that you can buy now, it's still not something you'd wear every day and the battery life is super short. So I think just the challenge of doing vision right, like off the bat, like would require quite a bit more resources. And so like audio is such a good entry point and it's also the privacy around audio. If you, if you had images, that's like another huge challenge to overcome. So I think that. Ideally the personal AI would have, you know, all the senses and you know, we'll, we'll get there. Yeah. Okay.swyx [00:48:57]: One last hardware thing. I have to ask this because then we'll move to the software. Were either of you electrical engineering?Ethan [00:49:04]: No, I'm CES. And so I have a, I've taken some EE courses, but I, I had done prior to working on, on the hardware here, like I had done a little bit of like embedded systems, like very little firmware, but we have luckily on the team, somebody with deep experience. Yeah.swyx [00:49:21]: I'm just like, you know, like you have to become hardware people. Yeah.Ethan [00:49:25]: Yeah. I mean, I learned to worry about supply chain power. I think this is like radio.Maria [00:49:30]: There's so many things to learn.Ethan [00:49:32]: I would tell this about hardware, like, and I know it's been said before, but building a prototype and like learning how the electronics work and learning about firmware and developing, this is like, I think fun for a lot of engineers and it's, it's all totally like achievable, especially now, like with, with the tools we have, like stuff you might've been intimidated about. Like, how do I like write this firmware now? With Sonnet, like you can, you can get going and actually see results quickly. But I think going from prototype to actually making something manufactured is a enormous jump. And it's not all about technology, the supply chain, the procurement, the regulations, the cost, the tooling. The thing about software that I'm used to is it's funny that you can make changes all along the way and ship it. But like when you have to buy tooling for an enclosure that's expensive.swyx [00:50:24]: Do you buy your own tooling? You have to.Ethan [00:50:25]: Don't you just subcontract out to someone in China? Oh, no. Do we make the tooling? No, no. You have to have CNC and like a bunch of machines.Maria [00:50:31]: Like nobody makes their own tooling, but like you have to design this design and you submitEthan [00:50:36]: it and then they go four to six weeks later. Yeah. And then if there's a problem with it, well, then you're not, you're not making any, any of your enclosures. And so you have to really plan ahead. And like.swyx [00:50:48]: I just want to leave tips for other hardware founders. Like what resources or websites are most helpful in your sort of manufacturing journey?Ethan [00:50:55]: You know, I think it's different depending on like it's hardware so specialized in different ways.Maria [00:51:00]: I will say that, for example, I should choose a manufacturer company. I speak with other founders and like we can give you like some, you know, some tips of who is good and who is not, or like who's specialized in something versus somebody else. Yeah.Ethan [00:51:15]: Like some people are good in plastics. Some people are good.Maria [00:51:18]: I think like for us, it really helped at the beginning to speak with others and understand. Okay. Like who is around. I work in Shenzhen. I lived almost two years in China. I have an idea about like different hardware manufacturer and all of that. Soon I will go back to Shenzhen to check out. So I think it's good also to go in place and check.Ethan [00:51:40]: Yeah, you have to like once you, if you, so we did some stuff domestically and like if you have that ability. The reason I say ability is very expensive, but like to build out some proof of concepts and do field testing before you take it to a manufacturer, despite what people say, there's really good domestic manufacturing for small quantities at extremely high prices. So we got our first PCB and the assembly done in LA. So there's a lot of good because of the defense industry that can do quick churn. So it's like, we need this board. We need to find out if it's working. We have this deadline we want to start, but you need to go through this. And like if you want to have it done and fabricated in a week, they can do it for a price. But I think, you know, everybody's kind of trending even for prototyping now moving that offshore because in China you can do prototyping and get it within almost the same timeline. But the thing is with manufacturing, like it really helps to go there and kind of establish the relationship. Yeah.Alessio [00:52:38]: My first company was a hardware company and we did our PCBs in China and took a long time. Now things are better. But this was, yeah, I don't know, 10 years ago, something like that. Yeah.Ethan [00:52:47]: I think that like the, and I've heard this too, we didn't run into this problem, but like, you know, if it's something where you don't have the relationship, they don't see you, they don't know you, you know, you might get subcontracted out or like they're not paying attention. But like if you're, you know, you have the relationship and a priority, like, yeah, it's really good. We ended up doing the fabrication assembly in Taiwan for various reasons.Maria [00:53:11]: And I think it really helped the fact that you went there at some point. Yeah.Ethan [00:53:15]: We're really happy with the process and, but I mean the whole process of just Choosing the right people. Choosing the right people, but also just sourcing the bill materials and all of that stuff. Like, I guess like if you have time, it's not that bad, but if you're trying to like really push the speed at that, it's incredibly stressful. Okay. We got to move to the software. Yeah.Alessio [00:53:38]: Yeah. So the hardware, maybe it's hard for people to understand, but what software people can understand is that running. Transcription and summarization, all of these things in real time every day for 24 hours a day. It's not easy. So you mentioned 200,000 tokens for a day. Yeah. How do you make it basically free to run all of this for the consumer?Ethan [00:53:59]: Well, I think that the pipeline and the inference, like people think about all of these tokens, but as you know, the price of tokens is like dramatically dropping. You guys probably have some charts somewhere that you've posted. We do. And like, if you see that trend in like 250,000 input tokens, it's not really that much, right? Like the output.swyx [00:54:21]: You do several layers. You do live. Yeah.Ethan [00:54:23]: Yeah. So the speech to text is like the most challenging part actually, because you know, it requires like real time processing and then like later processing with a larger model. And one thing that is fairly obvious is that like, you don't need to transcribe things that don't have any voice in it. Right? So good voice activity is key, right? Because like the majority of most people's day is not spent with voice activity. Right? So that is the first step to cutting down the amount of compute you have to do. And voice activity is a fairly cheap thing to do. Very, very cheap thing to do. The models that need to summarize, you don't need a Sonnet level kind of model to summarize. You do need a Sonnet level model to like execute things like the agent. And we will be having a subscription for like features like that because it's, you know, although now with the R1, like we'll see, we haven't evaluated it. A deep seek? Yeah. I mean, not that one in particular, but like, you know, they're already there that can kind of perform at that level. I was like, it's going to stay in six months, but like, yeah. So self-hosted models help in the things where you can. So you are self-hosting models. Yes. You are fine tuning your own ASR. Yes. I will say that I see in the future that everything's trending down. Although like, I think there might be an intermediary step with things to become expensive, which is like, we're really interested because like the pipeline is very tedious and like a lot of tuning. Right. Which is brutal because it's just a lot of trial and error. Whereas like, well, wouldn't it be nice if an end to end model could just do all of this and learn it? If we could do transcription with like an LLM, there's so many advantages to that, but it's going to be a larger model and hence like more compute, you know, we're optimistic. Maybe we could distill something down and like, we kind of more than focus on reducing the cost of the existing pipeline or trying to the next generation. Cause it's very clear that like all ASR, all speech to the text is going to be pretty obsolete pretty soon. So like investing into that is probably kind of a dead end. Cause it's just going to be. It's going to be obsolete.swyx [00:56:39]: It's interesting. Like I think when I initially invested in tab this is, this shows you how wrong I was. I was like, oh, this is a sort of razor blades, blade razors and blades model where you sell a cheap hardware and you make up a subscription, like a monthly subscription. And now I just checked friend is a one-time sale, $99 limitless one-time sale, $99. These guys one-time sale, $49 and inference is free. What? Wow. It's crazy.Ethan [00:57:09]: I think when you probably invested, like how much was a million input tokens at that time and what is it now?swyx [00:57:15]: It's a fascinating business and like, you know, there's a lot to dig into there, but just getting that perspective out there is, I think it's not something that people think about a lot.Alessio [00:57:24]: And you obviously have thought a lot about. What about memory? I think this is something we go back and forth on about memory as in you're just memorizing facts and then understanding implicit preference and adjusting facts that you think are important. Have you ever done something about a person? Any learnings from that? I know there's a lot of open source frameworks now that do it that you build all of your own infrastructure internally.Ethan [00:57:46]: Yeah, we did. I mean I evaluated used a lot in other projects. I think that there's a few different tasks or things that revolve around memory. Like one is like retrieval obviously. And like when you need to find like even if you have a large corpus of how do you find? And so like I think existing kind of rag pipelines also will probably be the most helpful. The frameworks, I have not found one, like, there's no general way to do RAG that works, like, it's really highly dependent on the data. So, like, if you're going to be customizing something that much, it's just, you get kind of more bang from the buck from designing it all yourself. You know, a lot of those frameworks are great for getting going quickly. But I think it's really interesting memory when you're trying to do, for a person, because memory is decay, right? Like, I'm going to London, you know, then I come back, I'm not going to London anymore. What we've learned is, like, doing the traditional, like, embedding and RAG is suboptimal. We kind of built our own using small models to do really massively parallel retrieval. Which I think is going to be maybe more common in the future. And then, like, how to represent a person. We still require some human loop. And I mean, this is an ongoing project. And, you know, we're learning every day. Like, how do you correct the model when it gets something wrong about you? Right now, we have, like, things that are, like, super confirmed that are, like, ground truth about you because the human accepted it. But ideally, like, that step wouldn't be necessary. And then we have things that are fuzzier. And, like, the more... Stuff that we know is true, the more accurate we are when we're trying to decide, is this fuzzy stuff? Because it's probably, like, if you have the context, it's probably not true. So I think it's one of the most core challenges is how to handle both retrieval and then modeling and, like, especially when you're dealing with noisy source data. Because, like, even if, in an ideal world, even if you just had perfect transcription and you're going off that, that's still not enough information, right? And even if you had visual, it's still not enough. Like, there's still going to be...Alessio [00:59:55]: Yeah, one way I think about it is I usually like to order the same thing from the same restaurant if I like it. But I'm not saying that out loud. And it's kind of like, are these type of behaviors? Like, when you ask about a favorite restaurant, I would just want it to give me restaurants that I've already been to that I like. Or, like, if I'm like, hey, just order something. from this place, I should just reorder the same thing. Because it knows that I like to redo the same thing. But I feel like today, most agent memory things that I see people publish, it's like, you know, just write down the data thing.Ethan [01:00:39]: Yeah, I mean, I think that's why the reasoning, like, in our case, like, giving it time to consider all of the sources it has. So, like, look at the email, see, like, the receipts, and then look at the conversations to see, like, what I've mentioned. And then be able to then take enough time to search through all the contexts and connect the dots is, I think, really important. And, like, I don't know, like, some of the agent memory stuff is, like, the key value with RAG on top. Like, and the results there are just not complete enough when you have, like, growing corpus and, like, managing decay and hallucinations that might be in the source material. So, this is where people usually bring in knowledge graphs. Yes. And do you do it? We don't extensively use knowledge graphs. It's something, you know, we didn't talk also about the kind of potential future social aspects.Maria [01:01:33]: Yeah, I wanted to speak about it.Ethan [01:01:35]: But the problem with knowledge graphs that we found is, like, and I don't know if you can tell me what your experience has been, but they're great for representing the data, but then, like, using it at inference time is kind of challenging.swyx [01:01:49]: For speed or what other issues?Ethan [01:01:51]: Just, like, the LLM understanding. Like, the graph. Yeah. The input. Yeah, it's not in the training data, for sure. I think that the graph is the right kind of way to store the data, but, like, then you need to have the right retrieval and then just kind of formatting in a way that, like, doesn't just overwhelm or confuse what you're trying to do. Should we ask about social? Yeah, I thought you were going to go into it. Yeah. Like, not directly related. We did some experimentation. Not directly related to, like, graph retrieval or graph knowledge races. Yeah. Yeah. Yeah. Yeah. The idea that having, like, your personal context, but then, like, other people can query it, you know, it can divulge some things that you would have full control over. Then Maria and I are trying to negotiate, like, where we're going to dinner, like, there can be an exchange. We exactly did this experiment. Yeah. There can be an exchange between the agents and, like, oh.Maria [01:02:45]: So how, like, my agent can speak with Ethan's agent. Both of them, they know our location, what we like, where we went in the past. Yeah. And even, you know, if we have our calendar integrated, they know when we're free. So they can interact with each other and have a conversation and decide a place to go for us. Wow. And we did that. And it was, for me, really cool because they suggested to us a nice French restaurant that we went at the end.swyx [01:03:11]: That you've never been to?Maria [01:03:12]: That we've never been to. Okay. But both of us, they said that we like French food. Both of us, we were in Pacific Heights. And, yeah, this was really trivial. Yeah.Ethan [01:03:23]: It's a trivial, like, toy use. But I guess, like, in terms of you've been using it for a while, like, if I wanted to buy you a gift.Maria [01:03:30]: Oh, my God. You bought me a bunch of candles now that I think about it.Ethan [01:03:35]: This is another use case. I was like, yeah. When we were testing the agent, like, a bunch of candles from Amazon showed up at her door.Maria [01:03:43]: Yeah, because I love candles, but I didn't expect 20. Yeah.Ethan [01:03:47]: It was a lot of experimenting. But, like, how to manage that where it's like, what's okay for your B to divulge to him? Who? Yeah. Like, shouldn't you get an authorization request every time? Yeah, yeah, yeah.swyx [01:03:58]: For personal context. Yeah, yeah, yeah.Ethan [01:04:00]: So, like, you know, you would have to, human would have to sign off on it. But I think then, like, then I wouldn't have to guess. I could just.swyx [01:04:10]: Yeah, yeah. You know, there's this culture that, like, is very alien to everyone else outside of SF and outside the Gen Z bubble in SF, which is sharing, location sharing. Yeah. I can tell my close friends where they are exactly right now in the city. Yeah. And it's opt-in. And, like, it's. Dude. Dude. You know, and, like, it's normal and, like, it freaks out everyone who's not here. Yeah. Yeah. And so maybe we can share preference, like, who we like. Absolutely.Maria [01:04:34]: I really believe in it, for sure. We will.Ethan [01:04:36]: Or even, like, small updates about your day. My parents would love that because I don't do that. Yeah. now there's no friction. It can just be more or less automatic. Yeah. Dating? I was trained always to avoid dating. Really? As a startup founder. Yeah, you can hate that. Yeah. Everyone hates it?Maria [01:04:55]: We thought about it. Like, sometimes some people, they ask to us because it's like, oh, you know so much about me. Like, can you measure compatibility with somebody else or something like that? Yeah. Probably there is a future. Maybe somebody should build that. I think on our end, we were like, no, this is. We don't want to.Ethan [01:05:11]: I will build on your API. My sister is actually a personality psychology professor and she studies personality. And we were at Thanksgiving because my parents wear one. And I was like, ask it. Like, give me my big five. Yeah. Which is like the personality type. And it's like. Does it know my big five? Just ask it to consider everything and give your big five. And my sister said it was pretty. I didn't agree with it because it said I was disagreeable. I agree with that. But she seemed to think it was agreeable. And so.swyx [01:05:41]: You disagree that you're disagreeable? Yeah. Yeah. What other proof do we need then?Ethan [01:05:47]: Yeah. I think I'm very agreeable.Ethan [01:05:51]: But I think that we do. I did get some users are like, oh, if like we're a couple. Yeah.Maria [01:05:56]: We had like couples. Actually. They bought the product together. Yeah. Like both. Like couple. They bought the hardware. So there is something there. Another test is like the Myers-Briggs. I know that you don't like that one. No. No.swyx [01:06:08]: Ocean is cooler than Myers-Briggs. Yeah. Everyone stop using my MBTI. Use my. Use Ocean. Yeah.Maria [01:06:12]: Yeah. For me, like it was on point. Like every time. Like it. Awesome.Alessio [01:06:16]: Anything else that we didn't cover? Any cool underrated things?Maria [01:06:21]: Go to b.computer. Forty nine. Ninety nine. And you buy the device. That's the. That's the call to action.swyx [01:06:28]: And you're hiring?Maria [01:06:29]: We are hiring. For sure.Ethan [01:06:32]: AI engineers.Maria [01:06:33]: AI engineers. Nice. What is an AI engineer?Ethan [01:06:35]: Yeah. But did you study? Somebody who's scrappy and willing to.Maria [01:06:42]: Work with us. Yeah.Ethan [01:06:43]: I think. I think you coined the term, right? So you can tell us.Maria [01:06:48]: Somebody that can adapt. That has resistance. Yeah. Yeah.swyx [01:06:51]: People have different perspectives and what is useful for you is different from what is useful for me. Yeah. So anyway, it's so useful.Ethan [01:06:57]: I mean, I think that always on AI is really going to explode and it's going to be a lot from both a lot of startups, but incumbents and there's going to be all kinds of new things that we're going to learn about how it's going to change all of our lives. I think that's the thing I'm most certain about. So. And being AI.swyx [01:07:15]: Well, thanks very much. Thank you guys. This is a pleasure. Thank you. Yeah. We'll see you launch whenever. Thank you. I'm sure that launch is happening. Yeah. Thanks. Thank you. Get full access to Latent.Space at www.latent.space/subscribe
    --------  
    1:08:52
  • The AI Architect — Bret Taylor
    If you’re in SF, join us tomorrow for a fun meetup at CodeGen Night!If you’re in NYC, join us for AI Engineer Summit! The Agent Engineering track is now sold out, but 25 tickets remain for AI Leadership and 5 tickets for the workshops. You can see the full schedule of speakers and workshops at https://ai.engineer!It’s exceedingly hard to introduce someone like Bret Taylor. We could recite his Wikipedia page, or his extensive work history through Silicon Valley’s greatest companies, but everyone else already does that.As a podcast by AI engineers for AI engineers, we had the opportunity to do something a little different. We wanted to dig into what Bret sees from his vantage point at the top of our industry for the last 2 decades, and how that explains the rise of the AI Architect at Sierra, the leading conversational AI/CX platform.“Across our customer base, we are seeing a new role emerge - the role of the AI architect. These leaders are responsible for helping define, manage and evolve their company's AI agent over time. They come from a variety of both technical and business backgrounds, and we think that every company will have one or many AI architects managing their AI agent and related experience.”In our conversation, Bret Taylor confirms the Paul Buchheit legend that he rewrote Google Maps in a weekend, armed with only the help of a then-nascent Google Closure Compiler and no other modern tooling. But what we find remarkable is that he was the PM of Maps, not an engineer, though of course he still identifies as one. We find this theme recurring throughout Bret’s career and worldview. We think it is plain as day that AI leadership will have to be hands-on and technical, especially when the ground is shifting as quickly as it is today:“There's a lot of power in combining product and engineering into as few people as possible… few great things have been created by committee.”“If engineering is an order taking organization for product you can sometimes make meaningful things, but rarely will you create extremely well crafted breakthrough products. Those tend to be small teams who deeply understand the customer need that they're solving, who have a maniacal focus on outcomes.”“And I think the reason why is if you look at like software as a service five years ago, maybe you can have a separation of product and engineering because most software as a service created five years ago. I wouldn't say there's like a lot of technological breakthroughs required for most business applications. And if you're making expense reporting software or whatever, it's useful… You kind of know how databases work, how to build auto scaling with your AWS cluster, whatever, you know, it's just, you're just applying best practices to yet another problem. "When you have areas like the early days of mobile development or the early days of interactive web applications, which I think Google Maps and Gmail represent, or now AI agents, you're in this constant conversation with what the requirements of your customers and stakeholders are and all the different people interacting with it and the capabilities of the technology. And it's almost impossible to specify the requirements of a product when you're not sure of the limitations of the technology itself.”This is the first time the difference between technical leadership for “normal” software and for “AI” software was articulated this clearly for us, and we’ll be thinking a lot about this going forward. We left a lot of nuggets in the conversation, so we hope you’ll just dive in with us (and thank Bret for joining the pod!)Full YouTubePlease Like and Subscribe :)Timestamps* 00:00:02 Introductions and Bret Taylor's background* 00:01:23 Bret's experience at Stanford and the dot-com era* 00:04:04 The story of rewriting Google Maps backend* 00:11:06 Early days of interactive web applications at Google* 00:15:26 Discussion on product management and engineering roles* 00:21:00 AI and the future of software development* 00:26:42 Bret's approach to identifying customer needs and building AI companies* 00:32:09 The evolution of business models in the AI era* 00:41:00 The future of programming languages and software development* 00:49:38 Challenges in precisely communicating human intent to machines* 00:56:44 Discussion on Artificial General Intelligence (AGI) and its impact* 01:08:51 The future of agent-to-agent communication* 01:14:03 Bret's involvement in the OpenAI leadership crisis* 01:22:11 OpenAI's relationship with Microsoft* 01:23:23 OpenAI's mission and priorities* 01:27:40 Bret's guiding principles for career choices* 01:29:12 Brief discussion on pasta-making* 01:30:47 How Bret keeps up with AI developments* 01:32:15 Exciting research directions in AI* 01:35:19 Closing remarks and hiring at Sierra Transcript[00:02:05] Introduction and Guest Welcome[00:02:05] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host swyx, founder of smol.ai.[00:02:17] swyx: Hey, and today we're super excited to have Bret Taylor join us. Welcome. Thanks for having me. It's a little unreal to have you in the studio.[00:02:25] swyx: I've read about you so much over the years, like even before. Open AI effectively. I mean, I use Google Maps to get here. So like, thank you for everything that you've done. Like, like your story history, like, you know, I think people can find out what your greatest hits have been.[00:02:40] Bret Taylor's Early Career and Education[00:02:40] swyx: How do you usually like to introduce yourself when, you know, you talk about, you summarize your career, like, how do you look at yourself?[00:02:47] Bret: Yeah, it's a great question. You know, we, before we went on the mics here, we're talking about the audience for this podcast being more engineering. And I do think depending on the audience, I'll introduce myself differently because I've had a lot of [00:03:00] corporate and board roles. I probably self identify as an engineer more than anything else though.[00:03:04] Bret: So even when I was. Salesforce, I was coding on the weekends. So I think of myself as an engineer and then all the roles that I do in my career sort of start with that just because I do feel like engineering is sort of a mindset and how I approach most of my life. So I'm an engineer first and that's how I describe myself.[00:03:24] Bret: You majored in computer[00:03:25] swyx: science, like 1998. And, and I was high[00:03:28] Bret: school, actually my, my college degree was Oh, two undergrad. Oh, three masters. Right. That old.[00:03:33] swyx: Yeah. I mean, no, I was going, I was going like 1998 to 2003, but like engineering wasn't as, wasn't a thing back then. Like we didn't have the title of senior engineer, you know, kind of like, it was just.[00:03:44] swyx: You were a programmer, you were a developer, maybe. What was it like in Stanford? Like, what was that feeling like? You know, was it, were you feeling like on the cusp of a great computer revolution? Or was it just like a niche, you know, interest at the time?[00:03:57] Stanford and the Dot-Com Bubble[00:03:57] Bret: Well, I was at Stanford, as you said, from 1998 to [00:04:00] 2002.[00:04:02] Bret: 1998 was near the peak of the dot com bubble. So. This is back in the day where most people that they're coding in the computer lab, just because there was these sun microsystems, Unix boxes there that most of us had to do our assignments on. And every single day there was a. com like buying pizza for everybody.[00:04:20] Bret: I didn't have to like, I got. Free food, like my first two years of university and then the dot com bubble burst in the middle of my college career. And so by the end there was like tumbleweed going to the job fair, you know, it was like, cause it was hard to describe unless you were there at the time, the like level of hype and being a computer science major at Stanford was like, A thousand opportunities.[00:04:45] Bret: And then, and then when I left, it was like Microsoft, IBM.[00:04:49] Joining Google and Early Projects[00:04:49] Bret: And then the two startups that I applied to were VMware and Google. And I ended up going to Google in large part because a woman named Marissa Meyer, who had been a teaching [00:05:00] assistant when I was, what was called a section leader, which was like a junior teaching assistant kind of for one of the big interest.[00:05:05] Bret: Yes. Classes. She had gone there. And she was recruiting me and I knew her and it was sort of felt safe, you know, like, I don't know. I thought about it much, but it turned out to be a real blessing. I realized like, you know, you always want to think you'd pick Google if given the option, but no one knew at the time.[00:05:20] Bret: And I wonder if I'd graduated in like 1999 where I've been like, mom, I just got a job at pets. com. It's good. But you know, at the end I just didn't have any options. So I was like, do I want to go like make kernel software at VMware? Do I want to go build search at Google? And I chose Google. 50, 50 ball.[00:05:36] Bret: I'm not really a 50, 50 ball. So I feel very fortunate in retrospect that the economy collapsed because in some ways it forced me into like one of the greatest companies of all time, but I kind of lucked into it, I think.[00:05:47] The Google Maps Rewrite Story[00:05:47] Alessio: So the famous story about Google is that you rewrote the Google maps back in, in one week after the map quest quest maps acquisition, what was the story there?[00:05:57] Alessio: Is it. Actually true. Is it [00:06:00] being glorified? Like how, how did that come to be? And is there any detail that maybe Paul hasn't shared before?[00:06:06] Bret: It's largely true, but I'll give the color commentary. So it was actually the front end, not the back end, but it turns out for Google maps, the front end was sort of the hard part just because Google maps was.[00:06:17] Bret: Largely the first ish kind of really interactive web application, say first ish. I think Gmail certainly was though Gmail, probably a lot of people then who weren't engineers probably didn't appreciate its level of interactivity. It was just fast, but. Google maps, because you could drag the map and it was sort of graphical.[00:06:38] Bret: My, it really in the mainstream, I think, was it a map[00:06:41] swyx: quest back then that was, you had the arrows up and down, it[00:06:44] Bret: was up and down arrows. Each map was a single image and you just click left and then wait for a few seconds to the new map to let it was really small too, because generating a big image was kind of expensive on computers that day.[00:06:57] Bret: So Google maps was truly innovative in that [00:07:00] regard. The story on it. There was a small company called where two technologies started by two Danish brothers, Lars and Jens Rasmussen, who are two of my closest friends now. They had made a windows app called expedition, which had beautiful maps. Even in 2000.[00:07:18] Bret: For whenever we acquired or sort of acquired their company, Windows software was not particularly fashionable, but they were really passionate about mapping and we had made a local search product that was kind of middling in terms of popularity, sort of like a yellow page of search product. So we wanted to really go into mapping.[00:07:36] Bret: We'd started working on it. Their small team seemed passionate about it. So we're like, come join us. We can build this together.[00:07:42] Technical Challenges and Innovations[00:07:42] Bret: It turned out to be a great blessing that they had built a windows app because you're less technically constrained when you're doing native code than you are building a web browser, particularly back then when there weren't really interactive web apps and it ended up.[00:07:56] Bret: Changing the level of quality that we [00:08:00] wanted to hit with the app because we were shooting for something that felt like a native windows application. So it was a really good fortune that we sort of, you know, their unusual technical choices turned out to be the greatest blessing. So we spent a lot of time basically saying, how can you make a interactive draggable map in a web browser?[00:08:18] Bret: How do you progressively load, you know, new map tiles, you know, as you're dragging even things like down in the weeds of the browser at the time, most browsers like Internet Explorer, which was dominant at the time would only load two images at a time from the same domain. So we ended up making our map tile servers have like.[00:08:37] Bret: Forty different subdomains so we could load maps and parallels like lots of hacks. I'm happy to go into as much as like[00:08:44] swyx: HTTP connections and stuff.[00:08:46] Bret: They just like, there was just maximum parallelism of two. And so if you had a map, set of map tiles, like eight of them, so So we just, we were down in the weeds of the browser anyway.[00:08:56] Bret: So it was lots of plumbing. I can, I know a lot more about browsers than [00:09:00] most people, but then by the end of it, it was fairly, it was a lot of duct tape on that code. If you've ever done an engineering project where you're not really sure the path from point A to point B, it's almost like. Building a house by building one room at a time.[00:09:14] Bret: The, there's not a lot of architectural cohesion at the end. And then we acquired a company called Keyhole, which became Google earth, which was like that three, it was a native windows app as well, separate app, great app, but with that, we got licenses to all this satellite imagery. And so in August of 2005, we added.[00:09:33] Bret: Satellite imagery to Google Maps, which added even more complexity in the code base. And then we decided we wanted to support Safari. There was no mobile phones yet. So Safari was this like nascent browser on, on the Mac. And it turns out there's like a lot of decisions behind the scenes, sort of inspired by this windows app, like heavy use of XML and XSLT and all these like.[00:09:54] Bret: Technologies that were like briefly fashionable in the early two thousands and everyone hates now for good [00:10:00] reason. And it turns out that all of the XML functionality and Internet Explorer wasn't supporting Safari. So people are like re implementing like XML parsers. And it was just like this like pile of s**t.[00:10:11] Bret: And I had to say a s**t on your part. Yeah, of[00:10:12] Alessio: course.[00:10:13] Bret: So. It went from this like beautifully elegant application that everyone was proud of to something that probably had hundreds of K of JavaScript, which sounds like nothing. Now we're talking like people have modems, you know, not all modems, but it was a big deal.[00:10:29] Bret: So it was like slow. It took a while to load and just, it wasn't like a great code base. Like everything was fragile. So I just got. Super frustrated by it. And then one weekend I did rewrite all of it. And at the time the word JSON hadn't been coined yet too, just to give you a sense. So it's all XML.[00:10:47] swyx: Yeah.[00:10:47] Bret: So we used what is now you would call JSON, but I just said like, let's use eval so that we can parse the data fast. And, and again, that's, it would literally as JSON, but at the time there was no name for it. So we [00:11:00] just said, let's. Pass on JavaScript from the server and eval it. And then somebody just refactored the whole thing.[00:11:05] Bret: And, and it wasn't like I was some genius. It was just like, you know, if you knew everything you wished you had known at the beginning and I knew all the functionality, cause I was the primary, one of the primary authors of the JavaScript. And I just like, I just drank a lot of coffee and just stayed up all weekend.[00:11:22] Bret: And then I, I guess I developed a bit of reputation and no one knew about this for a long time. And then Paul who created Gmail and I ended up starting a company with him too, after all of this told this on a podcast and now it's large, but it's largely true. I did rewrite it and it, my proudest thing.[00:11:38] Bret: And I think JavaScript people appreciate this. Like the un G zipped bundle size for all of Google maps. When I rewrote, it was 20 K G zipped. It was like much smaller for the entire application. It went down by like 10 X. So. What happened on Google? Google is a pretty mainstream company. And so like our usage is shot up because it turns out like it's faster.[00:11:57] Bret: Just being faster is worth a lot of [00:12:00] percentage points of growth at a scale of Google. So how[00:12:03] swyx: much modern tooling did you have? Like test suites no compilers.[00:12:07] Bret: Actually, that's not true. We did it one thing. So I actually think Google, I, you can. Download it. There's a, Google has a closure compiler, a closure compiler.[00:12:15] Bret: I don't know if anyone still uses it. It's gone. Yeah. Yeah. It's sort of gone out of favor. Yeah. Well, even until recently it was better than most JavaScript minifiers because it was more like it did a lot more renaming of variables and things. Most people use ES build now just cause it's fast and closure compilers built on Java and super slow and stuff like that.[00:12:37] Bret: But, so we did have that, that was it. Okay.[00:12:39] The Evolution of Web Applications[00:12:39] Bret: So and that was treated internally, you know, it was a really interesting time at Google at the time because there's a lot of teams working on fairly advanced JavaScript when no one was. So Google suggest, which Kevin Gibbs was the tech lead for, was the first kind of type ahead, autocomplete, I believe in a web browser, and now it's just pervasive in search boxes that you sort of [00:13:00] see a type ahead there.[00:13:01] Bret: I mean, chat, dbt[00:13:01] swyx: just added it. It's kind of like a round trip.[00:13:03] Bret: Totally. No, it's now pervasive as a UI affordance, but that was like Kevin's 20 percent project. And then Gmail, Paul you know, he tells the story better than anyone, but he's like, you know, basically was scratching his own itch, but what was really neat about it is email, because it's such a productivity tool, just needed to be faster.[00:13:21] Bret: So, you know, he was scratching his own itch of just making more stuff work on the client side. And then we, because of Lars and Yen sort of like setting the bar of this windows app or like we need our maps to be draggable. So we ended up. Not only innovate in terms of having a big sync, what would be called a single page application today, but also all the graphical stuff you know, we were crashing Firefox, like it was going out of style because, you know, when you make a document object model with the idea that it's a document and then you layer on some JavaScript and then we're essentially abusing all of this, it just was running into code paths that were not.[00:13:56] Bret: Well, it's rotten, you know, at this time. And so it was [00:14:00] super fun. And, and, you know, in the building you had, so you had compilers, people helping minify JavaScript just practically, but there is a great engineering team. So they were like, that's why Closure Compiler is so good. It was like a. Person who actually knew about programming languages doing it, not just, you know, writing regular expressions.[00:14:17] Bret: And then the team that is now the Chrome team believe, and I, I don't know this for a fact, but I'm pretty sure Google is the main contributor to Firefox for a long time in terms of code. And a lot of browser people were there. So every time we would crash Firefox, we'd like walk up two floors and say like, what the hell is going on here?[00:14:35] Bret: And they would load their browser, like in a debugger. And we could like figure out exactly what was breaking. And you can't change the code, right? Cause it's the browser. It's like slow, right? I mean, slow to update. So, but we could figure out exactly where the bug was and then work around it in our JavaScript.[00:14:52] Bret: So it was just like new territory. Like so super, super fun time, just like a lot of, a lot of great engineers figuring out [00:15:00] new things. And And now, you know, the word, this term is no longer in fashion, but the word Ajax, which was asynchronous JavaScript and XML cause I'm telling you XML, but see the word XML there, to be fair, the way you made HTTP requests from a client to server was this.[00:15:18] Bret: Object called XML HTTP request because Microsoft and making Outlook web access back in the day made this and it turns out to have nothing to do with XML. It's just a way of making HTTP requests because XML was like the fashionable thing. It was like that was the way you, you know, you did it. But the JSON came out of that, you know, and then a lot of the best practices around building JavaScript applications is pre React.[00:15:44] Bret: I think React was probably the big conceptual step forward that we needed. Even my first social network after Google, we used a lot of like HTML injection and. Making real time updates was still very hand coded and it's really neat when you [00:16:00] see conceptual breakthroughs like react because it's, I just love those things where it's like obvious once you see it, but it's so not obvious until you do.[00:16:07] Bret: And actually, well, I'm sure we'll get into AI, but I, I sort of feel like we'll go through that evolution with AI agents as well that I feel like we're missing a lot of the core abstractions that I think in 10 years we'll be like, gosh, how'd you make agents? Before that, you know, but it was kind of that early days of web applications.[00:16:22] swyx: There's a lot of contenders for the reactive jobs of of AI, but no clear winner yet. I would say one thing I was there for, I mean, there's so much we can go into there. You just covered so much.[00:16:32] Product Management and Engineering Synergy[00:16:32] swyx: One thing I just, I just observe is that I think the early Google days had this interesting mix of PM and engineer, which I think you are, you didn't, you didn't wait for PM to tell you these are my, this is my PRD.[00:16:42] swyx: This is my requirements.[00:16:44] mix: Oh,[00:16:44] Bret: okay.[00:16:45] swyx: I wasn't technically a software engineer. I mean,[00:16:48] Bret: by title, obviously. Right, right, right.[00:16:51] swyx: It's like a blend. And I feel like these days, product is its own discipline and its own lore and own industry and engineering is its own thing. And there's this process [00:17:00] that happens and they're kind of separated, but you don't produce as good of a product as if they were the same person.[00:17:06] swyx: And I'm curious, you know, if, if that, if that sort of resonates in, in, in terms of like comparing early Google versus modern startups that you see out there,[00:17:16] Bret: I certainly like wear a lot of hats. So, you know, sort of biased in this, but I really agree that there's a lot of power and combining product design engineering into as few people as possible because, you know few great things have been created by committee, you know, and so.[00:17:33] Bret: If engineering is an order taking organization for product you can sometimes make meaningful things, but rarely will you create extremely well crafted breakthrough products. Those tend to be small teams who deeply understand the customer need that they're solving, who have a. Maniacal focus on outcomes.[00:17:53] Bret: And I think the reason why it's, I think for some areas, if you look at like software as a service five years ago, maybe you can have a [00:18:00] separation of product and engineering because most software as a service created five years ago. I wouldn't say there's like a lot of like. Technological breakthroughs required for most, you know, business applications.[00:18:11] Bret: And if you're making expense reporting software or whatever, it's useful. I don't mean to be dismissive of expense reporting software, but you probably just want to understand like, what are the requirements of the finance department? What are the requirements of an individual file expense report? Okay.[00:18:25] Bret: Go implement that. And you kind of know how web applications are implemented. You kind of know how to. How databases work, how to build auto scaling with your AWS cluster, whatever, you know, it's just, you're just applying best practices to yet another problem when you have areas like the early days of mobile development or the early days of interactive web applications, which I think Google Maps and Gmail represent, or now AI agents, you're in this constant conversation with what the requirements of your customers and stakeholders are and all the different people interacting with it.[00:18:58] Bret: And the capabilities of the [00:19:00] technology. And it's almost impossible to specify the requirements of a product when you're not sure of the limitations of the technology itself. And that's why I use the word conversation. It's not literal. That's sort of funny to use that word in the age of conversational AI.[00:19:15] Bret: You're constantly sort of saying, like, ideally, you could sprinkle some magic AI pixie dust and solve all the world's problems, but it's not the way it works. And it turns out that actually, I'll just give an interesting example.[00:19:26] AI Agents and Modern Tooling[00:19:26] Bret: I think most people listening probably use co pilots to code like Cursor or Devon or Microsoft Copilot or whatever.[00:19:34] Bret: Most of those tools are, they're remarkable. I'm, I couldn't, you know, imagine development without them now, but they're not autonomous yet. Like I wouldn't let it just write most code without my interactively inspecting it. We just are somewhere between it's an amazing co pilot and it's an autonomous software engineer.[00:19:53] Bret: As a product manager, like your aspirations for what the product is are like kind of meaningful. But [00:20:00] if you're a product person, yeah, of course you'd say it should be autonomous. You should click a button and program should come out the other side. The requirements meaningless. Like what matters is like, what is based on the like very nuanced limitations of the technology.[00:20:14] Bret: What is it capable of? And then how do you maximize the leverage? It gives a software engineering team, given those very nuanced trade offs. Coupled with the fact that those nuanced trade offs are changing more rapidly than any technology in my memory, meaning every few months you'll have new models with new capabilities.[00:20:34] Bret: So how do you construct a product that can absorb those new capabilities as rapidly as possible as well? That requires such a combination of technical depth and understanding the customer that you really need more integration. Of product design and engineering. And so I think it's why with these big technology waves, I think startups have a bit of a leg up relative to incumbents because they [00:21:00] tend to be sort of more self actualized in terms of just like bringing those disciplines closer together.[00:21:06] Bret: And in particular, I think entrepreneurs, the proverbial full stack engineers, you know, have a leg up as well because. I think most breakthroughs happen when you have someone who can understand those extremely nuanced technical trade offs, have a vision for a product. And then in the process of building it, have that, as I said, like metaphorical conversation with the technology, right?[00:21:30] Bret: Gosh, I ran into a technical limit that I didn't expect. It's not just like changing that feature. You might need to refactor the whole product based on that. And I think that's, that it's particularly important right now. So I don't, you know, if you, if you're building a big ERP system, probably there's a great reason to have product and engineering.[00:21:51] Bret: I think in general, the disciplines are there for a reason. I think when you're dealing with something as nuanced as the like technologies, like large language models today, there's a ton of [00:22:00] advantage of having. Individuals or organizations that integrate the disciplines more formally.[00:22:05] Alessio: That makes a lot of sense.[00:22:06] Alessio: I've run a lot of engineering teams in the past, and I think the product versus engineering tension has always been more about effort than like whether or not the feature is buildable. But I think, yeah, today you see a lot more of like. Models actually cannot do that. And I think the most interesting thing is on the startup side, people don't yet know where a lot of the AI value is going to accrue.[00:22:26] Alessio: So you have this rush of people building frameworks, building infrastructure, layered things, but we don't really know the shape of the compute. I'm curious that Sierra, like how you thought about building an house, a lot of the tooling for evals or like just, you know, building the agents and all of that.[00:22:41] Alessio: Versus how you see some of the startup opportunities that is maybe still out there.[00:22:46] Bret: We build most of our tooling in house at Sierra, not all. It's, we don't, it's not like not invented here syndrome necessarily, though, maybe slightly guilty of that in some ways, but because we're trying to build a platform [00:23:00] that's in Dorian, you know, we really want to have control over our own destiny.[00:23:03] Bret: And you had made a comment earlier that like. We're still trying to figure out who like the reactive agents are and the jury is still out. I would argue it hasn't been created yet. I don't think the jury is still out to go use that metaphor. We're sort of in the jQuery era of agents, not the react era.[00:23:19] Bret: And, and that's like a throwback for people listening,[00:23:22] swyx: we shouldn't rush it. You know?[00:23:23] Bret: No, yeah, that's my point is. And so. Because we're trying to create an enduring company at Sierra that outlives us, you know, I'm not sure we want to like attach our cart to some like to a horse where it's not clear that like we've figured out and I actually want as a company, we're trying to enable just at a high level and I'll, I'll quickly go back to tech at Sierra, we help consumer brands build customer facing AI agents.[00:23:48] Bret: So. Everyone from Sonos to ADT home security to Sirius XM, you know, if you call them on the phone and AI will pick up with you, you know, chat with them on the Sirius XM homepage. It's an AI agent called Harmony [00:24:00] that they've built on our platform. We're what are the contours of what it means for someone to build an end to end complete customer experience with AI with conversational AI.[00:24:09] Bret: You know, we really want to dive into the deep end of, of all the trade offs to do it. You know, where do you use fine tuning? Where do you string models together? You know, where do you use reasoning? Where do you use generation? How do you use reasoning? How do you express the guardrails of an agentic process?[00:24:25] Bret: How do you impose determinism on a fundamentally non deterministic technology? There's just a lot of really like as an important design space. And I could sit here and tell you, we have the best approach. Every entrepreneur will, you know. But I hope that in two years, we look back at our platform and laugh at how naive we were, because that's the pace of change broadly.[00:24:45] Bret: If you talk about like the startup opportunities, I'm not wholly skeptical of tools companies, but I'm fairly skeptical. There's always an exception for every role, but I believe that certainly there's a big market for [00:25:00] frontier models, but largely for companies with huge CapEx budgets. So. Open AI and Microsoft's Anthropic and Amazon Web Services, Google Cloud XAI, which is very well capitalized now, but I think the, the idea that a company can make money sort of pre training a foundation model is probably not true.[00:25:20] Bret: It's hard to, you're competing with just, you know, unreasonably large CapEx budgets. And I just like the cloud infrastructure market, I think will be largely there. I also really believe in the applications of AI. And I define that not as like building agents or things like that. I define it much more as like, you're actually solving a problem for a business.[00:25:40] Bret: So it's what Harvey is doing in legal profession or what cursor is doing for software engineering or what we're doing for customer experience and customer service. The reason I believe in that is I do think that in the age of AI, what's really interesting about software is it can actually complete a task.[00:25:56] Bret: It can actually do a job, which is very different than the value proposition of [00:26:00] software was to ancient history two years ago. And as a consequence, I think the way you build a solution and For a domain is very different than you would have before, which means that it's not obvious, like the incumbent incumbents have like a leg up, you know, necessarily, they certainly have some advantages, but there's just such a different form factor, you know, for providing a solution and it's just really valuable.[00:26:23] Bret: You know, it's. Like just think of how much money cursor is saving software engineering teams or the alternative, how much revenue it can produce tool making is really challenging. If you look at the cloud market, just as a analog, there are a lot of like interesting tools, companies, you know, Confluent, Monetized Kafka, Snowflake, Hortonworks, you know, there's a, there's a bunch of them.[00:26:48] Bret: A lot of them, you know, have that mix of sort of like like confluence or have the open source or open core or whatever you call it. I, I, I'm not an expert in this area. You know, I do think [00:27:00] that developers are fickle. I think that in the tool space, I probably like. Default towards open source being like the area that will win.[00:27:09] Bret: It's hard to build a company around this and then you end up with companies sort of built around open source to that can work. Don't get me wrong, but I just think that it's nowadays the tools are changing so rapidly that I'm like, not totally skeptical of tool makers, but I just think that open source will broadly win, but I think that the CapEx required for building frontier models is such that it will go to a handful of big companies.[00:27:33] Bret: And then I really believe in agents for specific domains which I think will, it's sort of the analog to software as a service in this new era. You know, it's like, if you just think of the cloud. You can lease a server. It's just a low level primitive, or you can buy an app like you know, Shopify or whatever.[00:27:51] Bret: And most people building a storefront would prefer Shopify over hand rolling their e commerce storefront. I think the same thing will be true of AI. So [00:28:00] I've. I tend to like, if I have a, like an entrepreneur asked me for advice, I'm like, you know, move up the stack as far as you can towards a customer need.[00:28:09] Bret: Broadly, but I, but it doesn't reduce my excitement about what is the reactive building agents kind of thing, just because it is, it is the right question to ask, but I think we'll probably play out probably an open source space more than anything else.[00:28:21] swyx: Yeah, and it's not a priority for you. There's a lot in there.[00:28:24] swyx: I'm kind of curious about your idea maze towards, there are many customer needs. You happen to identify customer experience as yours, but it could equally have been coding assistance or whatever. I think for some, I'm just kind of curious at the top down, how do you look at the world in terms of the potential problem space?[00:28:44] swyx: Because there are many people out there who are very smart and pick the wrong problem.[00:28:47] Bret: Yeah, that's a great question.[00:28:48] Future of Software Development[00:28:48] Bret: By the way, I would love to talk about the future of software, too, because despite the fact it didn't pick coding, I have a lot of that, but I can talk to I can answer your question, though, you know I think when a technology is as [00:29:00] cool as large language models.[00:29:02] Bret: You just see a lot of people starting from the technology and searching for a problem to solve. And I think it's why you see a lot of tools companies, because as a software engineer, you start building an app or a demo and you, you encounter some pain points. You're like,[00:29:17] swyx: a lot of[00:29:17] Bret: people are experiencing the same pain point.[00:29:19] Bret: What if I make it? That it's just very incremental. And you know, I always like to use the metaphor, like you can sell coffee beans, roasted coffee beans. You can add some value. You took coffee beans and you roasted them and roasted coffee beans largely, you know, are priced relative to the cost of the beans.[00:29:39] Bret: Or you can sell a latte and a latte. Is rarely priced directly like as a percentage of coffee bean prices. In fact, if you buy a latte at the airport, it's a captive audience. So it's a really expensive latte. And there's just a lot that goes into like. How much does a latte cost? And I bring it up because there's a supply chain from growing [00:30:00] coffee beans to roasting coffee beans to like, you know, you could make one at home or you could be in the airport and buy one and the margins of the company selling lattes in the airport is a lot higher than the, you know, people roasting the coffee beans and it's because you've actually solved a much more acute human problem in the airport.[00:30:19] Bret: And, and it's just worth a lot more to that person in that moment. It's kind of the way I think about technology too. It sounds funny to liken it to coffee beans, but you're selling tools on top of a large language model yet in some ways your market is big, but you're probably going to like be price compressed just because you're sort of a piece of infrastructure and then you have open source and all these other things competing with you naturally.[00:30:43] Bret: If you go and solve a really big business problem for somebody, that's actually like a meaningful business problem that AI facilitates, they will value it according to the value of that business problem. And so I actually feel like people should just stop. You're like, no, that's, that's [00:31:00] unfair. If you're searching for an idea of people, I, I love people trying things, even if, I mean, most of the, a lot of the greatest ideas have been things no one believed in.[00:31:07] Bret: So I like, if you're passionate about something, go do it. Like who am I to say, yeah, a hundred percent. Or Gmail, like Paul as far, I mean I, some of it's Laura at this point, but like Gmail is Paul's own email for a long time. , and then I amusingly and Paul can't correct me, I'm pretty sure he sent her in a link and like the first comment was like, this is really neat.[00:31:26] Bret: It would be great. It was not your email, but my own . I don't know if it's a true story. I'm pretty sure it's, yeah, I've read that before. So scratch your own niche. Fine. Like it depends on what your goal is. If you wanna do like a venture backed company, if its a. Passion project, f*****g passion, do it like don't listen to anybody.[00:31:41] Bret: In fact, but if you're trying to start, you know an enduring company, solve an important business problem. And I, and I do think that in the world of agents, the software industries has shifted where you're not just helping people more. People be more productive, but you're actually accomplishing tasks autonomously.[00:31:58] Bret: And as a consequence, I think the [00:32:00] addressable market has just greatly expanded just because software can actually do things now and actually accomplish tasks and how much is coding autocomplete worth. A fair amount. How much is the eventual, I'm certain we'll have it, the software agent that actually writes the code and delivers it to you, that's worth a lot.[00:32:20] Bret: And so, you know, I would just maybe look up from the large language models and start thinking about the economy and, you know, think from first principles. I don't wanna get too far afield, but just think about which parts of the economy. We'll benefit most from this intelligence and which parts can absorb it most easily.[00:32:38] Bret: And what would an agent in this space look like? Who's the customer of it is the technology feasible. And I would just start with these business problems more. And I think, you know, the best companies tend to have great engineers who happen to have great insight into a market. And it's that last part that I think some people.[00:32:56] Bret: Whether or not they have, it's like people start so much in the technology, they [00:33:00] lose the forest for the trees a little bit.[00:33:02] Alessio: How do you think about the model of still selling some sort of software versus selling more package labor? I feel like when people are selling the package labor, it's almost more stateless, you know, like it's easier to swap out if you're just putting an input and getting an output.[00:33:16] Alessio: If you think about coding, if there's no ID, you're just putting a prompt and getting back an app. It doesn't really matter. Who generates the app, you know, you have less of a buy in versus the platform you're building, I'm sure on the backend customers have to like put on their documentation and they have, you know, different workflows that they can tie in what's kind of like the line to draw there versus like going full where you're managed customer support team as a service outsource versus.[00:33:40] Alessio: This is the Sierra platform that you can build on. What was that decision? I'll sort of[00:33:44] Bret: like decouple the question in some ways, which is when you have something that's an agent, who is the person using it and what do they want to do with it? So let's just take your coding agent for a second. I will talk about Sierra as well.[00:33:59] Bret: Who's the [00:34:00] customer of a, an agent that actually produces software? Is it a software engineering manager? Is it a software engineer? And it's there, you know, intern so to speak. I don't know. I mean, we'll figure this out over the next few years. Like what is that? And is it generating code that you then review?[00:34:16] Bret: Is it generating code with a set of unit tests that pass, what is the actual. For lack of a better word contract, like, how do you know that it did what you wanted it to do? And then I would say like the product and the pricing, the packaging model sort of emerged from that. And I don't think the world's figured out.[00:34:33] Bret: I think it'll be different for every agent. You know, in our customer base, we do what's called outcome based pricing. So essentially every time the AI agent. Solves the problem or saves a customer or whatever it might be. There's a pre negotiated rate for that. We do that. Cause it's, we think that that's sort of the correct way agents, you know, should be packaged.[00:34:53] Bret: I look back at the history of like cloud software and notably the introduction of the browser, which led to [00:35:00] software being delivered in a browser, like Salesforce to. Famously invented sort of software as a service, which is both a technical delivery model through the browser, but also a business model, which is you subscribe to it rather than pay for a perpetual license.[00:35:13] Bret: Those two things are somewhat orthogonal, but not really. If you think about the idea of software running in a browser, that's hosted. Data center that you don't own, you sort of needed to change the business model because you don't, you can't really buy a perpetual license or something otherwise like, how do you afford making changes to it?[00:35:31] Bret: So it only worked when you were buying like a new version every year or whatever. So to some degree, but then the business model shift actually changed business as we know it, because now like. Things like Adobe Photoshop. Now you subscribe to rather than purchase. So it ended up where you had a technical shift and a business model shift that were very logically intertwined that actually the business model shift was turned out to be as significant as the technical as the shift.[00:35:59] Bret: And I think with [00:36:00] agents, because they actually accomplish a job, I do think that it doesn't make sense to me that you'd pay for the privilege of like. Using the software like that coding agent, like if it writes really bad code, like fire it, you know, I don't know what the right metaphor is like you should pay for a job.[00:36:17] Bret: Well done in my opinion. I mean, that's how you pay your software engineers, right? And[00:36:20] swyx: and well, not really. We paid to put them on salary and give them options and they vest over time. That's fair.[00:36:26] Bret: But my point is that you don't pay them for how many characters they write, which is sort of the token based, you know, whatever, like, There's a, that famous Apple story where we're like asking for a report of how many lines of code you wrote.[00:36:40] Bret: And one of the engineers showed up with like a negative number cause he had just like done a big refactoring. There was like a big F you to management who didn't understand how software is written. You know, my sense is like the traditional usage based or seat based thing. It's just going to look really antiquated.[00:36:55] Bret: Cause it's like asking your software engineer, how many lines of code did you write today? Like who cares? Like, cause [00:37:00] absolutely no correlation. So my old view is I don't think it's be different in every category, but I do think that that is the, if an agent is doing a job, you should, I think it properly incentivizes the maker of that agent and the customer of, of your pain for the job well done.[00:37:16] Bret: It's not always perfect to measure. It's hard to measure engineering productivity, but you can, you should do something other than how many keys you typed, you know Talk about perverse incentives for AI, right? Like I can write really long functions to do the same thing, right? So broadly speaking, you know, I do think that we're going to see a change in business models of software towards outcomes.[00:37:36] Bret: And I think you'll see a change in delivery models too. And, and, you know, in our customer base you know, we empower our customers to really have their hands on the steering wheel of what the agent does they, they want and need that. But the role is different. You know, at a lot of our customers, the customer experience operations folks have renamed themselves the AI architects, which I think is really cool.[00:37:55] Bret: And, you know, it's like in the early days of the Internet, there's the role of the webmaster. [00:38:00] And I don't know whether your webmaster is not a fashionable, you know, Term, nor is it a job anymore? I just, I don't know. Will they, our tech stand the test of time? Maybe, maybe not. But I do think that again, I like, you know, because everyone listening right now is a software engineer.[00:38:14] Bret: Like what is the form factor of a coding agent? And actually I'll, I'll take a breath. Cause actually I have a bunch of pins on them. Like I wrote a blog post right before Christmas, just on the future of software development. And one of the things that's interesting is like, if you look at the way I use cursor today, as an example, it's inside of.[00:38:31] Bret: A repackaged visual studio code environment. I sometimes use the sort of agentic parts of it, but it's largely, you know, I've sort of gotten a good routine of making it auto complete code in the way I want through tuning it properly when it actually can write. I do wonder what like the future of development environments will look like.[00:38:55] Bret: And to your point on what is a software product, I think it's going to change a lot in [00:39:00] ways that will surprise us. But I always use, I use the metaphor in my blog post of, have you all driven around in a way, Mo around here? Yeah, everyone has. And there are these Jaguars, the really nice cars, but it's funny because it still has a steering wheel, even though there's no one sitting there and the steering wheels like turning and stuff clearly in the future.[00:39:16] Bret: If once we get to that, be more ubiquitous, like why have the steering wheel and also why have all the seats facing forward? Maybe just for car sickness. I don't know, but you could totally rearrange the car. I mean, so much of the car is oriented around the driver, so. It stands to reason to me that like, well, autonomous agents for software engineering run through visual studio code.[00:39:37] Bret: That seems a little bit silly because having a single source code file open one at a time is kind of a goofy form factor for when like the code isn't being written primarily by you, but it begs the question of what's your relationship with that agent. And I think the same is true in our industry of customer experience, which is like.[00:39:55] Bret: Who are the people managing this agent? What are the tools do they need? And they definitely need [00:40:00] tools, but it's probably pretty different than the tools we had before. It's certainly different than training a contact center team. And as software engineers, I think that I would like to see particularly like on the passion project side or research side.[00:40:14] Bret: More innovation in programming languages. I think that we're bringing the cost of writing code down to zero. So the fact that we're still writing Python with AI cracks me up just cause it's like literally was designed to be ergonomic to write, not safe to run or fast to run. I would love to see more innovation and how we verify program correctness.[00:40:37] Bret: I studied for formal verification in college a little bit and. It's not very fashionable because it's really like tedious and slow and doesn't work very well. If a lot of code is being written by a machine, you know, one of the primary values we can provide is verifying that it actually does what we intend that it does.[00:40:56] Bret: I think there should be lots of interesting things in the software development life cycle, like how [00:41:00] we think of testing and everything else, because. If you think about if we have to manually read every line of code that's coming out as machines, it will just rate limit how much the machines can do. The alternative is totally unsafe.[00:41:13] Bret: So I wouldn't want to put code in production that didn't go through proper code review and inspection. So my whole view is like, I actually think there's like an AI native I don't think the coding agents don't work well enough to do this yet, but once they do, what is sort of an AI native software development life cycle and how do you actually.[00:41:31] Bret: Enable the creators of software to produce the highest quality, most robust, fastest software and know that it's correct. And I think that's an incredible opportunity. I mean, how much C code can we rewrite and rust and make it safe so that there's fewer security vulnerabilities. Can we like have more efficient, safer code than ever before?[00:41:53] Bret: And can you have someone who's like that guy in the matrix, you know, like staring at the little green things, like where could you have an operator [00:42:00] of a code generating machine be like superhuman? I think that's a cool vision. And I think too many people are focused on like. Autocomplete, you know, right now, I'm not, I'm not even, I'm guilty as charged.[00:42:10] Bret: I guess in some ways, but I just like, I'd like to see some bolder ideas. And that's why when you were joking, you know, talking about what's the react of whatever, I think we're clearly in a local maximum, you know, metaphor, like sort of conceptual local maximum, obviously it's moving really fast. I think we're moving out of it.[00:42:26] Alessio: Yeah. At the end of 23, I've read this blog post from syntax to semantics. Like if you think about Python. It's taking C and making it more semantic and LLMs are like the ultimate semantic program, right? You can just talk to them and they can generate any type of syntax from your language. But again, the languages that they have to use were made for us, not for them.[00:42:46] Alessio: But the problem is like, as long as you will ever need a human to intervene, you cannot change the language under it. You know what I mean? So I'm curious at what point of automation we'll need to get, we're going to be okay making changes. To the underlying languages, [00:43:00] like the programming languages versus just saying, Hey, you just got to write Python because I understand Python and I'm more important at the end of the day than the model.[00:43:08] Alessio: But I think that will change, but I don't know if it's like two years or five years. I think it's more nuanced actually.[00:43:13] Bret: So I think there's a, some of the more interesting programming languages bring semantics into syntax. So let me, that's a little reductive, but like Rust as an example, Rust is memory safe.[00:43:25] Bret: Statically, and that was a really interesting conceptual, but it's why it's hard to write rust. It's why most people write python instead of rust. I think rust programs are safer and faster than python, probably slower to compile. But like broadly speaking, like given the option, if you didn't have to care about the labor that went into it.[00:43:45] Bret: You should prefer a program written in Rust over a program written in Python, just because it will run more efficiently. It's almost certainly safer, et cetera, et cetera, depending on how you define safe, but most people don't write Rust because it's kind of a pain in the ass. And [00:44:00] the audience of people who can is smaller, but it's sort of better in most, most ways.[00:44:05] Bret: And again, let's say you're making a web service and you didn't have to care about how hard it was to write. If you just got the output of the web service, the rest one would be cheaper to operate. It's certainly cheaper and probably more correct just because there's so much in the static analysis implied by the rest programming language that it probably will have fewer runtime errors and things like that as well.[00:44:25] Bret: So I just give that as an example, because so rust, at least my understanding that came out of the Mozilla team, because. There's lots of security vulnerabilities in the browser and it needs to be really fast. They said, okay, we want to put more of a burden at the authorship time to have fewer issues at runtime.[00:44:43] Bret: And we need the constraint that it has to be done statically because browsers need to be really fast. My sense is if you just think about like the, the needs of a programming language today, where the role of a software engineer is [00:45:00] to use an AI to generate functionality and audit that it does in fact work as intended, maybe functionally, maybe from like a correctness standpoint, some combination thereof, how would you create a programming system that facilitated that?[00:45:15] Bret: And, you know, I bring up Rust is because I think it's a good example of like, I think given a choice of writing in C or Rust, you should choose Rust today. I think most people would say that, even C aficionados, just because. C is largely less safe for very similar, you know, trade offs, you know, for the, the system and now with AI, it's like, okay, well, that just changes the game on writing these things.[00:45:36] Bret: And so like, I just wonder if a combination of programming languages that are more structurally oriented towards the values that we need from an AI generated program, verifiable correctness and all of that. If it's tedious to produce for a person, that maybe doesn't matter. But one thing, like if I asked you, is this rest program memory safe?[00:45:58] Bret: You wouldn't have to read it, you just have [00:46:00] to compile it. So that's interesting. I mean, that's like an, that's one example of a very modest form of formal verification. So I bring that up because I do think you have AI inspect AI, you can have AI reviewed. Do AI code reviews. It would disappoint me if the best we could get was AI reviewing Python and having scaled a few very large.[00:46:21] Bret: Websites that were written on Python. It's just like, you know, expensive and it's like every, trust me, every team who's written a big web service in Python has experimented with like Pi Pi and all these things just to make it slightly more efficient than it naturally is. You don't really have true multi threading anyway.[00:46:36] Bret: It's just like clearly that you do it just because it's convenient to write. And I just feel like we're, I don't want to say it's insane. I just mean. I do think we're at a local maximum. And I would hope that we create a programming system, a combination of programming languages, formal verification, testing, automated code reviews, where you can use AI to generate software in a high scale way and trust it.[00:46:59] Bret: And you're [00:47:00] not limited by your ability to read it necessarily. I don't know exactly what form that would take, but I feel like that would be a pretty cool world to live in.[00:47:08] Alessio: Yeah. We had Chris Lanner on the podcast. He's doing great work with modular. I mean, I love. LVM. Yeah. Basically merging rust in and Python.[00:47:15] Alessio: That's kind of the idea. Should be, but I'm curious is like, for them a big use case was like making it compatible with Python, same APIs so that Python developers could use it. Yeah. And so I, I wonder at what point, well, yeah.[00:47:26] Bret: At least my understanding is they're targeting the data science Yeah. Machine learning crowd, which is all written in Python, so still feels like a local maximum.[00:47:34] Bret: Yeah.[00:47:34] swyx: Yeah, exactly. I'll force you to make a prediction. You know, Python's roughly 30 years old. In 30 years from now, is Rust going to be bigger than Python?[00:47:42] Bret: I don't know this, but just, I don't even know this is a prediction. I just am sort of like saying stuff I hope is true. I would like to see an AI native programming language and programming system, and I use language because I'm not sure language is even the right thing, but I hope in 30 years, there's an AI native way we make [00:48:00] software that is wholly uncorrelated with the current set of programming languages.[00:48:04] Bret: or not uncorrelated, but I think most programming languages today were designed to be efficiently authored by people and some have different trade offs.[00:48:15] Evolution of Programming Languages[00:48:15] Bret: You know, you have Haskell and others that were designed for abstractions for parallelism and things like that. You have programming languages like Python, which are designed to be very easily written, sort of like Perl and Python lineage, which is why data scientists use it.[00:48:31] Bret: It's it can, it has a. Interactive mode, things like that. And I love, I'm a huge Python fan. So despite all my Python trash talk, a huge Python fan wrote at least two of my three companies were exclusively written in Python and then C came out of the birth of Unix and it wasn't the first, but certainly the most prominent first step after assembly language, right?[00:48:54] Bret: Where you had higher level abstractions rather than and going beyond go to, to like abstractions, [00:49:00] like the for loop and the while loop.[00:49:01] The Future of Software Engineering[00:49:01] Bret: So I just think that if the act of writing code is no longer a meaningful human exercise, maybe it will be, I don't know. I'm just saying it sort of feels like maybe it's one of those parts of history that just will sort of like go away, but there's still the role of this offer engineer, like the person actually building the system.[00:49:20] Bret: Right. And. What does a programming system for that form factor look like?[00:49:25] React and Front-End Development[00:49:25] Bret: And I, I just have a, I hope to be just like I mentioned, I remember I was at Facebook in the very early days when, when, what is now react was being created. And I remember when the, it was like released open source I had left by that time and I was just like, this is so f*****g cool.[00:49:42] Bret: Like, you know, to basically model your app independent of the data flowing through it, just made everything easier. And then now. You know, I can create, like there's a lot of the front end software gym play is like a little chaotic for me, to be honest with you. It is like, it's sort of like [00:50:00] abstraction soup right now for me, but like some of those core ideas felt really ergonomic.[00:50:04] Bret: I just wanna, I'm just looking forward to the day when someone comes up with a programming system that feels both really like an aha moment, but completely foreign to me at the same time. Because they created it with sort of like from first principles recognizing that like. Authoring code in an editor is maybe not like the primary like reason why a programming system exists anymore.[00:50:26] Bret: And I think that's like, that would be a very exciting day for me.[00:50:28] The Role of AI in Programming[00:50:28] swyx: Yeah, I would say like the various versions of this discussion have happened at the end of the day, you still need to precisely communicate what you want. As a manager of people, as someone who has done many, many legal contracts, you know how hard that is.[00:50:42] swyx: And then now we have to talk to machines doing that and AIs interpreting what we mean and reading our minds effectively. I don't know how to get across that barrier of translating human intent to instructions. And yes, it can be more declarative, but I don't know if it'll ever Crossover from being [00:51:00] a programming language to something more than that.[00:51:02] Bret: I agree with you. And I actually do think if you look at like a legal contract, you know, the imprecision of the English language, it's like a flaw in the system. How many[00:51:12] swyx: holes there are.[00:51:13] Bret: And I do think that when you're making a mission critical software system, I don't think it should be English language prompts.[00:51:19] Bret: I think that is silly because you want the precision of a a programming language. My point was less about that and more about if the actual act of authoring it, like if you.[00:51:32] Formal Verification in Software[00:51:32] Bret: I'll think of some embedded systems do use formal verification. I know it's very common in like security protocols now so that you can, because the importance of correctness is so great.[00:51:41] Bret: My intellectual exercise is like, why not do that for all software? I mean, probably that's silly just literally to do what we literally do for. These low level security protocols, but the only reason we don't is because it's hard and tedious and hard and tedious are no longer factors. So, like, if I could, I mean, [00:52:00] just think of, like, the silliest app on your phone right now, the idea that that app should be, like, formally verified for its correctness feels laughable right now because, like, God, why would you spend the time on it?[00:52:10] Bret: But if it's zero costs, like, yeah, I guess so. I mean, it never crashed. That's probably good. You know, why not? I just want to, like, set our bars really high. Like. We should make, software has been amazing. Like there's a Mark Andreessen blog post, software is eating the world. And you know, our whole life is, is mediated digitally.[00:52:26] Bret: And that's just increasing with AI. And now we'll have our personal agents talking to the agents on the CRO platform and it's agents all the way down, you know, our core infrastructure is running on these digital systems. We now have like, and we've had a shortage of software developers for my entire life.[00:52:45] Bret: And as a consequence, you know if you look, remember like health care, got healthcare. gov that fiasco security vulnerabilities leading to state actors getting access to critical infrastructure. I'm like. We now have like created this like amazing system that can [00:53:00] like, we can fix this, you know, and I, I just want to, I'm both excited about the productivity gains in the economy, but I just think as software engineers, we should be bolder.[00:53:08] Bret: Like we should have aspirations to fix these systems so that like in general, as you said, as precise as we want to be in the specification of the system. We can make it work correctly now, and I'm being a little bit hand wavy, and I think we need some systems. I think that's where we should set the bar, especially when so much of our life depends on this critical digital infrastructure.[00:53:28] Bret: So I'm I'm just like super optimistic about it. But actually, let's go to what you said for a second, which is correct.[00:53:33] The Importance of Specifications[00:53:33] Bret: Specifications. I think this is the most interesting part of A. I. Agents broadly, which is that most specifications are incomplete. So let's go back to our product engineering discussions.[00:53:45] Bret: You're like, okay, here's a P. R. D. Product requirements document and there's it's really detailed mockups and this like when you click this button, it does this and it's like 100 percent you can think of a missing requirement that [00:54:00] document. Let's say you click this button And the internet goes out, what do you do?[00:54:04] Bret: I don't know if that's in the PRD. It probably isn't, you know, there's, there's always going to be something because like humans are complicated. Right. So what ends up happening is like, I don't know if you can measure it, like what percentage of a product's actual functionality is determined by its code versus the specification, like for a traditional product, Oh, 95%.[00:54:24] Bret: I mean, a little bit, but a lot of it. So like. Code is the specification.[00:54:29] Open Source and Implicit Standards[00:54:29] Bret: It's actually why if you just look at the history of technology, why open source has won out over specifications, like, you know, for a long time, there was a W3C working group on the HTML specification and then, you know, once web kit became prevalent.[00:54:46] Bret: The internet evolved a lot faster and it's not the expense of the standards organizations. It just turns out having a committee of people argue is like a lot less efficient than someone checking in code and then all of a sudden you had vector graphics and you had like [00:55:00] all this really cool stuff that, you know, someone who, in the Google maps days, a guy like, God, that would have made my life easier.[00:55:05] Bret: You know, it's like. SVG support, life would have been a breeze. Try drawing a driving directions line without vector graphics. And so, you know, in general, I think we've gone from these protocols defined in a document to basically open source code that becomes an implicit standard, like systems calls and Linux, like.[00:55:26] Bret: There is a specification. There is post X as a standard, but like the Colonel is the like, that's what people write against and it's both the documented behavior and all of the undocumented behaviors as well for better for worse. And it's why, you know, Linus and others are so adamant about things like binary compatibility and all that, like this stuff matters.[00:55:48] Bret: So one of the things that I really think about is like working with agents broadly is how do you, it's. I don't want to say it's easy to specify the guardrails, you know, [00:56:00] but what about all those unspecified behaviors? So so much of like being a software engineer is like, you come to the point where you're like the internet's out and you get back the error code from the call and you got to do something with it.[00:56:12] Bret: And you know, what percent of the time do you just be like. Yeah, I'm going to do this because it seems reasonable. And what percentage of time do you like write a slack to your PM and be like, what do I do in this case? It's probably more the former than the latter. Otherwise it'd be really fricking inefficient to write software.[00:56:27] AI Agents and Decision Making[00:56:27] Bret: But what happens when your AI makes that decision for you? It's not a wrong decision. You didn't say anything about that case. The AI agent, the word agent comes from the word agency, right? So it's demonstrating its agency and it's making a decision. Does it document it? That would probably be tedious to like, because there's so many implicit decisions.[00:56:44] Bret: What happens when you click the button and the internet's out? It does something you don't like. How do you fix it? I actually think that we are like entering this new world where like the, how we express to an AI agent, what we want [00:57:00] is always going to be an incomplete specification, and that's why agents are useful because they can fill in the gaps with some decent amount of reasoning.[00:57:07] Bret: How you actually tune these over time. And imagine like building an app with an AI agent as your software engineering companion, there's like an infinitely long tail. Infinite is probably over exaggerating a bit, but there's a fairly long tail of functionality that I guarantee is not specified how you actually tune that.[00:57:25] Bret: And this is what I mean about creating a programming system. I don't think we know what that system is yet. And then similarly, I actually think for every single agentic domain, whether it's customer service or legal or software engineering, that's essentially what the company building those agents is building is like the system through which you express the behaviors you want, esoteric and small as it might be anyway, I think that's a really exciting area though, just because I think that's where the magic or that's where the product insights will be in the space is like, how do you encounter that those moments?[00:57:56] Bret: It's kind of built into the UX[00:57:58] swyx: and it can't just be, [00:58:00] the answer can't just be prompt better, you know? No, no, it's impossible.[00:58:04] Bret: The prompt would be too long. Like, imagine getting a PRD that literally specified the behavior of everything that was represented by code. The answer would just be code. Like at that point.[00:58:14] Bret: So here's my point, like prompts are great, but it's not actually a complete specification for anything. It never can be. And so, and I think that's. How you do interactivity, like the sort of human in a loop thing, when and how you do it. And that's why I really believe in, in domain specific agents, because I think answering that in the abstract is like a interesting intellectual exercise.[00:58:39] Bret: But I, that's why I like talking about agents in the abstract kind of, I'm actively disinterested in it because I don't think it actually means anything. All it means is software is making decisions. That's what, you know, at least in a reductive way. But in the context of software engineering, it does make sense.[00:58:53] Bret: Cause you know, like what is the process of first you specify what you want in a product, then you use it, then you give [00:59:00] feedback. You can imagine building a product that actually facilitated that closed loop system. And then how is that represented that complete specification of both what you knew you wanted, what you discovered through usage, the union of all of that is what you care about, and the rest is less to the AI.[00:59:16] Bret: In the legal context, I'm certain there's a way to know, like, when should the AI ask questions? When shouldn't it? How do you actually intervene when it's wrong? And certainly in the customer service case, it's very clear, you know, and how, like how we, our customers review every conversation, how we. Help them find the conversations they should review when they're having millions so they can find the few that are interesting how when something is wrong in one of those conversations, how they can give feedback.[00:59:42] Bret: So it's fixed the next time in a way where we know the context of why I made that decision. But it's not up to us what's right, right? It's up to our customers. So that's why I actually think for right, you know, right now when you think about building an agent and domain to some degree, how you actually interact with the [01:00:00] people specifies behavior is actually where a lot of the magic is.[01:00:03] swyx: Stop me if this is a little bit annoying to you, but I have a bit of a trouble squaring. domain specific agents with the belief that AGI is real, or AGI is coming, because the point is general intelligence. And some part, some way, one way to view the bitter lesson is we can always make progress on being more domain specific.[01:00:22] swyx: Take whatever SOTA is, and you make progress being more domain specific, and then you will be wiped out. The next advance happens. Clearly, you don't believe in that, but how do you personally square those things?[01:00:34] Bret: Yeah, it's a really heavy question.[01:00:36] The Impact of AGI on Industries[01:00:36] Bret: And you know, I think a lot about AGI given my role at open AI but it's even hard for me to really conceptualize.[01:00:41] Bret: And I love spending time with open AI researchers and actually just like people in the community broadly just talking about the implications because there's the first order of fact and I effects of something that is super intelligent in some domains. And then there's the second and third order effects are harder to predict.[01:00:57] Bret: So first as I think that. [01:01:00] It seems likely to me that, you know, at first and something that is AGI will be good in digital domains. You know, because it's software. So if you think about something like AI discovering a new say like pharmaceutical therapy, the barrier to that is probably less the discovery than the clinical trial.[01:01:23] Bret: And, and AI doesn't necessarily help with a clinical trial, right? That's a process that's. Independent of intelligence and it's, it's a physical process. Similarly, if you think about the problem of climate change or like carbon removal, there's probably a lot of that domain that requires great ideas, but like whatever great idea you came up with, if you wanted to sequester that much carbon, there's probably a big physical component to that.[01:01:47] Bret: So it's not really limited by intelligence. It might be, I'm sure it could be accelerated somewhat by intelligence. There's a really interesting conversation with an economist named Tyler Cohen, California. And recently he just, I just watched a video [01:02:00] of him and he was just talking about how there's parts of the economy where intelligence is sort of the limited resource that will take on AI slash AGI really rapidly and will drive incredible productivity gains.[01:02:13] Bret: But there are other parts of the economy that aren't and those will interact. It goes back to these complex second artifacts like prices will go up in the domains that can absorb absorb intelligence rapidly, which will actually then slow down, you know, so it's going to, I don't think it'll be evenly spread.[01:02:28] Bret: I don't think it would be perhaps as rapidly felt in all parts of the economy as people think I might be wrong, but I just think you can generalize in terms of its ability to. Reason about different domains, which I think is what AGI means to most people, but it may not actually. Generalized in the world and tell, because there's a lot of intelligence is not the limiting factor and like a lot of the economy.[01:02:54] Bret: So going back to your, your more practical question is like, why make software at all of, you know, AGI is coming and [01:03:00] say it that way. Should we learn to[01:03:01] swyx: code?[01:03:01] Bret: There's all variations of this. You know, my view is that I really do view AI as a tool and AGI as a tool for humanity. And so my view is when we were talking about like.[01:03:14] Bret: Is your job as a maker of software to author a code in an editor? I would argue no just like a generation ago. Your job wasn't to punch cards in a punch card That is not what your job is. Your job is to produce digital something, whatever it is, what is the purpose of the software that you're making?[01:03:34] Bret: Your job is to produce that. And so I think that like our jobs will change rapidly and meaningfully, but I think the idea that like our job is to type in a. And an editor is, is an artifact of the tools that we have, not actually what we're hired to do, which is to produce a digital experience, to, you know, make firmware for a toaster or whatever, whatever it is we're [01:04:00] doing.[01:04:00] Bret: Right. Like that's our job. Right. And. As a consequence, I think with things like AGI, I think the certainly software engineering will be one of the disciplines most impacted. And I think that it's very, so like, I think if you're in this industry and you define yourself by the tools that you use, like how many characters you can type into them every day, that's probably not like a long term stable place to be, because that's something that certainly AI can do better than you.[01:04:33] Bret: But your judgment about what to build and how to build it still apply. And that will always be true. And one way to think about it's like a little bit reductive is like, you know, look at startups versus larger companies. Like companies like Google and Amazon have so many more engineers than a startup, but then some startups still win.[01:04:51] Bret: Like, why was that? Well, they made better decisions, right? They didn't type faster or produce more code. They did the right thing in the right market, the right time. [01:05:00] And, and similarly. If you look at some of the great companies, it wasn't the lack of they had some unique idea. Sometimes that's a reason why a company succeeds, but it's often a lot of other things and a lot of other forms of execution.[01:05:12] Bret: So like broadly, like the existence of a lot of intelligence will change a lot and it'll change our jobs more than any other industry, or maybe not, maybe it's exaggerated, but certainly as much as any other industry. But I don't think it like changes, like why the economy around digital technology exists.[01:05:29] Bret: And as a consequence, I think I'm really bullish on like the future of, of the software industry. I just think that like some things that are really expensive today will become almost free. And but I think that, I mean, let's be honest, the half life of technology companies is not particularly long as it is.[01:05:46] Bret: Yeah, I, I brought this anecdote in a recent conversation, but When I started at Google, we were in one building in Mountain View and then eventually moved into a campus, which was previously the Silicon Graphics campus. That was the first campus Google, I'm pretty sure it [01:06:00] still has that campus. I think it's got a billion now.[01:06:02] Bret: SGI was a company that was like really, really big, big enough to have a campus and then went out of business. And it wasn't that old of a company, by the way, it's not like IBM, you know, it was like. Big enough to get a campus and go to business in my lifetime, you know, that type of thing. And then at Facebook, we had an office in pallets.[01:06:18] Bret: I moved, I didn't go into the original office when I joined. It was the second office, this old HP building near Stanford. And then we got big enough to want to campus and we bought some microsystems campus. Sun Microsystem famously came out of Stanford, went high flying, was one of the. com darlings, and then eventually sort of like bought for pennies on the dollar by Oracle.[01:06:39] Bret: And you know, like all those companies, like in my lifetime were big enough to like go public, have a campus and then go out of business. So I think a lot will change. I don't mean to say this is going to be easy or like no one's business model is under threat, but. Will digital technology remain important?[01:06:56] Bret: Will entrepreneurs having good judgment about where to [01:07:00] apply this technology to create something of economic value still apply like a hundred percent. And I've always used the metaphor, like if you went back to 1980 and describe many of the jobs that we have, it would be hard for people to conceptualize.[01:07:13] Bret: Like imagine. I'm a podcaster. You're like, what the hell does that mean? Imagine going back to like 1776 and describing to Ben Franklin, our economy today, like let alone the technology industry, just the services economy. It would be probably hard for him to conceptualize just like who grows the food, just because the idea that so few people in this country are necessary to produce the food for so many people would defy.[01:07:39] Bret: So much of his conception of just like how food is grown, that it would just be like, it would probably take a couple hours of explaining. It's kind of like the same thing. It's like we, we have a view of like how this world works right now. That's based on just the constraints that exist, but there's gonna be a lot of other opportunities and other things like that.[01:07:57] Bret: So I don't know. I mean, it's certainly [01:08:00] writing code is really valuable right now and it probably will change rapidly. I think people just need a lot of agility. I always use the metaphor where like a bunch of accountants and Microsoft Excel was just invented. Are you going to be the first person who sets down your HP calculator and says, I'm going to learn how to use this tool because it's just a better way of doing what I'm already doing.[01:08:19] Bret: Or are you going to be the one who's like, you know, begrudgingly pulling out their slide rule and HP calculator and saying these kids these days, you know, their Excel, they don't understand, you know, it's been a little bit reductive, but I just feel like the, the probably the best thing all of us can do, not just in software industry, but I do think it's really.[01:08:38] Bret: Kind of interesting just reflection that we're disrupting our own industry as much as anything else with this technology is to lean into the change, try the tools, like install the latest coding assistance, you know, when Oh three mini comes out, write some code with it that you don't want to be the last accountant to embrace Excel.[01:08:57] Bret: You might not have your job anymore, so.[01:08:59] swyx: [01:09:00] We have some personal questions on like how you keep up with AI and you know, all that, all the other stuff. But I also want to, and I'll let you get to your question. I just wanted to say that the analogy that you made on food was really interesting and resonated with me.[01:09:12] swyx: I feel like we are kind of in like an agrarian economy of like a barter economy for intelligence and now we're sort of industrializing intelligence. And I, that really just was an aha[01:09:21] Alessio: moment for me. I just wanted to reflect that. Yeah. How do you think about. The person being replaced by an agent and how agents talk to each other.[01:09:29] Alessio: So even at Sierra today, right, you're building agents that people talk to, but in the future, you're going to have agents that are going to complain about the order they placed to the customer support agents all the way down. Exactly. And you know, you were the CTO of Facebook, you built OpenGraph there.[01:09:44] Alessio: And I think there were a lot of pros, things that were being enabled, then maybe a lot of cons that came out of that. How do you think about how the agent protocols should be built, thinking about all the implications of it, you know, privacy, data, discoverability and all that?[01:09:57] Bret: Yeah, I think it's a little early for a [01:10:00] protocol to emerge.[01:10:00] Bret: I've read about a few of the attempts and maybe some of them will catch on. One of the things that's really interesting about large language models is because they're trained on language as they are very capable of using the interfaces built for us. And so. My intuition right now is that because we can make an interface that works for us and also works for the AI, maybe that's good enough.[01:10:23] Bret: You know, I mean, a little bit hand wavy here, but making a machine protocol for agents that's inaccessible to people, there's some upsides to it, but there's also quite a bit of downside to it as well. I think it was Andrej Karpathy, but I can't remember. But like one of the more well known AI researchers wrote, like I spent half my day writing English, you know, in my software engineering I have an intuition that agents will speak to agents using language for a while.[01:10:53] Bret: I don't know if that's true. But there's a lot of reasons why there, that may be true. And so, you know, [01:11:00] when. Your personal agent speaks to a Sierra agent to help figure out why your Sonos speaker has the flashing orange light. My intuition is it will be in English for a while. And I think there's a lot of, like, benefits to that.[01:11:13] Bret: I do think that we still are in the early days of Like long running agents I don't know if you tried the deep research agent that just came up,[01:11:22] swyx: we have one for you. Oh, that's great.[01:11:25] Bret: It was interesting cause it was probably the first time I really got like notified by open AI when something was done and I brought up before the interactive parts of it.[01:11:34] Bret: That's the area that I'm most interested in right now. It just is like most agentic workflows are relatively short running and. The workflows that are multi stakeholder, long running and multi system we deal with a lot of those and, and at Sierra, but broadly speaking, I think that those are interesting just because I, I always use the metaphor that prior to the mobile phone, every time you got like [01:12:00] a notification from some internet service, you get an email, not because email was like the best way to notify you, but it's the only way.[01:12:08] Bret: And so you know, you used to get tagged on a photo in Facebook and you get an email about it. Then once. This was in everyone's pocket. Every app had equal access to buzzing your pocket. And now, you know, for most of the apps I use, I don't get email notifications. I just get, get it directly from the app.[01:12:25] Bret: I sort of wonder what the form factors will be for agents. How do you address and reach out to other agents? And then how does it bring you the, the operator of the agent into the loop at the right time? You know, I certainly think there's companies like, you know, with chat GPT, that will be one of the major consumer surfaces.[01:12:42] Bret: So there's like, there's a lot of like gravity to those services. But then if I think about sort of domain specific workflows as well, I think there's just a lot to figure out there. So I'm less. The agent agent protocols. I actually think I could be wrong. I just haven't thought about a lot. Like it's sort of interesting, but actually just how it engages with all [01:13:00] the people in it is actually one of the things I'm most interested to sort of see how it plays out as well.[01:13:04] Alessio: Yeah. I think to me, the things that are at the core of it is kind of like our back, you know, it's like, can this agent access this thing? I think in the customer support use cases, maybe less prominent, but like in the enterprises is more interesting. And also like language, like you can compress the language.[01:13:20] Alessio: If the human didn't have to read it, you can kind of save tokens, make things faster. So yeah, you mentioned being notified about deep research. Is there a open AI deep research has been achieved internally notification that goes out to everybody and the board gets summoned and you get to see it. Can you give any backstory on that process?[01:13:40] Bret: OpenAI is a mission driven nonprofit that I think of primarily as a research lab. It's obviously more than that, you know, in some ways like chat GPT is a cultural defining product. But at the end of the day, the mission is to ensure that artificial general intelligence benefits all of humanity. So a lot [01:14:00] of our board discussions are about.[01:14:02] Bret: Research and its implications on humanity, which is primarily safety. Obviously, I think the one cannot achieve AGI and not think about safety as a primary responsibility for that mission, but it's also access and other things. So things like deep research, we definitely talk about because it's a big part of, if you think about what does it mean to build AGI, but we talk about a lot of different things, you know, so it's like Sometimes we hear about things super early.[01:14:26] Bret: Sometimes if it's not really related, if it's sort of far afield from the core of the mission, you know, it's like more casual. So it's pretty fun, fun to be a part of that just because it's my favorite part of every board discussion is just hearing from the researchers about. How they're thinking about the future and just like the next, next milestone and creating AGI.[01:14:44] swyx: Well, lots of milestones. Maybe we'll just start at the beginning. Like, you know, there are very few people that have been in the rooms that you've been in. How do these conversations start? How do you get brought into opening? I obviously there's, there's a bit of drama that you can go into if you want.[01:14:56] swyx: Just take us into the room. Like what happens? What is it [01:15:00] like?[01:15:00] Bret: Was it a. Thursday or Friday when Friday was fired. Yeah. So I heard about it like everyone else, you know, just like saw it on, on social media. And I remember[01:15:12] swyx: where I was walking here and I was[01:15:14] Bret: totally shocked and messaged my co founder clay.[01:15:17] Bret: And I was like, gosh, I wonder what happened. And then. On Saturday, trying to just protect sort of like people's privacy on this. But I ended up talking to both Adam D'Angelo and Sam Altman and basically getting a kind of synopsis of what was going on and my understanding that you could, you'd have to ask them for sort of their perspective on this was just basically like they, both the board and Sam both felt some trust in me.[01:15:44] Bret: And it was a very complicated situation because the, the company was reacted pretty negatively, understandably negatively to Sam's being fired. I don't think they really understood what was going on. And so the board was, you know, in a situation where they needed to sort of figure [01:16:00] out a path forward and they reached out to me and then I talked to Sam and basically ended up kind of the mediator for lack of a better word, not really formally that, but fundamentally that.[01:16:10] Bret: And as the board was trying to figure out a path forward, you know, we, we ended up with a lot of discussions with like how to reinstate Sam is a CEO of the company, but also do a review of what happens so that the board's concerns could be fully sort of adjudicated, you know because they obviously did have concerns going into it.[01:16:29] Bret: So it ended up there. So I think broadly speaking, I was just like a known, like a lot of the stakeholders in it knew of me and, and I'd like to think I have some integrity, so it was just sort of like, you know, they were trying to find a way out of a very complex situation. So I ended up kind of meeting in that and have formed a.[01:16:48] Bret: A really great relationship with Sam and Greg and pretty challenging time for the company didn't plan to be, you know, on the board. I got pulled in because of the crisis that happened. [01:17:00] And I don't think I'll be on the board forever either. I, I posted when I joined that I was going to do it temporarily.[01:17:05] Bret: That was like a year ago. You know, I really like to focus on Sierra, but I also really care about, it's just an amazing mission. So[01:17:15] Navigating High-Stakes Situations[01:17:15] swyx: I've been maybe been in like high stakes situations like that, like twice, but obviously not as high stakes, but like, what principles do you have? When you know, like, this is the highest egos, highest amount of stakes possible, highest amount of money, whatever.[01:17:31] swyx: What principles do you have to go into something like this? Like, obviously you have a great reputation, you have a great network. What are your must do's and what are your must not do's?[01:17:39] Bret: I'm not sure there's a If there were a playbook for these situations, there'd be a lot simpler. You know, I just probably go back to like the way I operate in general.[01:17:49] Bret: One is first principles thinking. So I, I do think that there's crisis playbooks, but there was nothing quite like this and you really need to [01:18:00] understand what's going on and why. I think a lot of. Moments of crisis are fundamentally human problems. You can strategize about people's incentives and this and that and the other thing, but I think it's really important to understand all the people involved and what motivates them and why, which is fundamentally an exercise in empathy.[01:18:18] Bret: Actually. Like, do you really understand. Why people are doing what they're doing and then getting good advice, you know, and I think people What's interesting about a high profile crisis is everyone wants to give you advice So there's no shortage of advice, but the good advice is the one I think that really involves judgment Which is who are people based on first principles analysis of the situation based on your assessment?[01:18:41] Bret: Of what, you know, all the people involved who would have true expertise and good judgment, you know, in these situations so that you can either validate your judgment if you have an intuition or if it's an area that's like a area of like, say, a legal expertise that you're not expert and [01:19:00] you want the best in the world to give you advice.[01:19:02] Bret: And I actually find people often seek out. The wrong people for advice and it's really important in those circumstances.[01:19:08] swyx: Well, I mean, it's super well navigated. I have, I've got one more and then we can sort of move on on this topic. The the, the Microsoft offer was real, right? For Sam and team to move over at some, at one point in that weekend.[01:19:19] Bret: I'm not sure. I was sort of in it from one vantage point, which was actually, it's interesting. It's like, I didn't really have. Particular skin in the game. So like I came up with this, I still don't own any equity in open AI. I was just I was just a meaningful bystander in the process. And the reason I got involved and and it will get to answer your question, but the reason I got involved was just because I cared about open AI.[01:19:44] Bret: So. You know, I had left my job at Salesforce and by coincidence, the next month chat GBT comes out and, you know, I got nerd sniped like everyone else. I'm like, I want to spend my life on this. This is so amazing. And I wouldn't, I don't know if I'd be, I wouldn't, I'm not [01:20:00] sure I would have started another company if not for open AI, kind of inspiring the world with chat GPT, maybe I would have, I don't know, but it was like, it had a very significant impact on you, all of us, I think.[01:20:11] Bret: So the idea that it would dissolve in a weekend just like bothered me a lot. And I'm very, like, I'm very grateful for, for open AI's existence. And, and I, my guess is that is probably shared by a lot of the competing research labs to different degrees too. It's just like it kind of that rising tide lifted all boats.[01:20:27] Bret: Like I think it created the proverbial iPhone moment for AI and, and changed, changed the world. So there were lots of. Microsoft is an investor in open AI. It has a vested interest in it. The Sam and Greg had their interests. The employees had their interests and there's lots of wheeling and dealing.[01:20:49] Bret: And I, you know, you can't AB test decision making. So I don't know if like things had fallen apart with that. I don't, I don't actually know. And you also don't know, like what's real, what's not. I [01:21:00] mean, so you'd have to talk to, to them to know it was really real. So.[01:21:03] swyx: Mentioning advisors. I heard it seems like Brian Armstrong was.[01:21:07] swyx: surprisingly strong advisor on during, during the whole journey, which is[01:21:10] Bret: the my understanding was both Brian Armstrong and Ron Conway were really close to Sam through it. And I ended up talking to him, but also tried to. Talk a lot to the board to, you know, trying to be the mediator. I was trying to, you obviously have a position on it.[01:21:25] Bret: Like, and I, I felt that, you know, from the outside looking in, I just really wanted to understand, like, why did this happen? And the process seemed, you know perhaps, you know, to say the least. But I was trying to remain sort of dispassionate because one of the principles was like, if you want to put Humpty Dumpty back together again, you can't be a single issue voter, right?[01:21:45] Bret: Like you have to go in and say like, so it was a pretty sensitive moment. But yeah, my, I think Brian's one of the great entrepreneurs and a true true, true friend and ally to, to Sam through that he's[01:21:55] swyx: been through a lot. As well. The reason I bring up Microsoft is because, [01:22:00] I mean, obviously Huge Backer.[01:22:01] swyx: We actually talked to David Juan who pitched, I think it was Satya at the time, on on the, the first billion dollar investment in OpenAI. The understanding I had was that the best situation was for Open OpenAI, for Microsoft was open. The As is second best was Microsoft Echo hires Sam and Greg and, and whoever else.[01:22:19] swyx: And that was the relationship at the time. Super close, exclusive relationship and all that. I think now things have evolved a little bit. And you know, with, with the evolution of Stargate and there's some, some uncertainty or FUD about the relationship between Microsoft and OpenAI. And I just wanted to, just kind of bring that up.[01:22:38] swyx: Because like, we're also working, like, one, Satya's, we're fortunate to have Satya as a subscriber to InSpace. And we're working on an interview with him. And we're trying to figure out. How this has evolved now, like what, what is, how would you characterize the relationship between Microsoft and OpenAI?[01:22:52] Bret: Microsoft's, you know, the most important partner of OpenAI, you know, so we have a really like deep relationship with them on many [01:23:00] fronts.[01:23:00] Bret: So I think it's always evolving just because the scale of this market is evolving and in particular the capital requirements for infrastructure. Are well beyond what anyone would have predicted two years ago, let alone whenever the Microsoft relationship started. Well, what was that six years ago? I actually don't, I should know off the top of my head, but it was a long time long in this, in the world of AI, a long, longer time ago.[01:23:24] Bret: I don't really think there's anything to share. I mean, it's I don't, I think the relationships evolved because the markets evolved, but the core tenants of the partnership have remained the same. And it's, you know, by far open eyes, most important partner.[01:23:36] swyx: Just double clicking a little bit more, just like a lot of, obviously a lot of our listeners are, you know, care a lot about the priorities of OpenAI.[01:23:43] swyx: I've had it phrased to me that OpenAI had sort of five Top level priorities, like always have frontier models always be on the frontier sort of efficiency as well. Be the first in sort of multi modality, whether it's video generation or real time voice, anything like that. How would you characterize the top priorities of [01:24:00] OpenAI?[01:24:00] swyx: Apart from just the highest level AGI thing.[01:24:02] Bret: I always come back to the highest level AGI as you put it, it is a mission driven organization. And I think a lot of companies talk about their mission, but OpenAI is literally like the mission defines everything that we do. And I think it is important to understand that if you're trying to like.[01:24:20] Bret: Predict where open AI is going to go, because if it doesn't serve the mission, it's very unlikely that it will be a priority for open AI. You know, it's a big organization, so occasionally you might have like side projects, you're like, you know what, I'm not sure that's going to really serve the mission as much as we thought, like, let's not do it anymore.[01:24:36] Bret: But at the end of the day, like people work at open AI because they believe in the benefits the AGI can have to humanity. Some people are there because they want to build it. And the actual act of building is incredibly intellectually rewarding. Some people are there because they want to ensure that AGI is safe.[01:24:55] Bret: I think we have the best AGI safety team in the world. And there's just [01:25:00] so many interesting research problems to, to tackle there as these models become increasingly capable, as they have access to the internet, it has access to tools. It's just like really interesting stuff, but everyone is there because they're interested in the mission.[01:25:13] Bret: And as a consequence, I think that. You know, if you look at something like deep research, that lens, it's pretty logical, right? It's like, of course, that's if you're going to think about what it means to create AGI, enabling AI to help further the cause of research is, is meaningful. You can see why a lot of the AGI labs are working on.[01:25:34] Bret: Software engineering and code generation, because that seems pretty useful if you're trying to make AGI, right? Just because a huge part of it is, is code, you know to do it. Similarly, as you look at sort of tool use and agents right down the middle of what you need to do AGI, that is the part of the company.[01:25:51] Bret: I don't think there is like a. Top, I mean, sure, there's like a, maybe an operational top 10 list, but it is fundamentally about building AGI and [01:26:00] ensuring AGI benefits all of humanity. And that's all we exist for. And the rest of it is like, not a distraction necessarily, but that's like the only reason the organization exists.[01:26:09] Bret: The thing that I think is remarkable is if I had. Describe that mission to the two of you four years ago, like, you know, one of the interesting things is like, how do you think society would use AI? We'd probably think almost maybe like industrial applications, robots, all these other things. I think chat GPT has been the most.[01:26:26] Bret: Delightful. And it doesn't feel counterintuitive now, but like counterintuitive way to serve that mission, because the idea that you can go to chat, gpt. com and access the most advanced intelligence in the world. And there's like a free tier is like pretty amazing. So actually one of the neat things I think is that chat GPT, you know, famously was a research preview that turned into this brand, you know, industry defining brand.[01:26:54] Bret: I think it is one of the more key parts of the mission in a lot of ways because it is the [01:27:00] way many people will use this intelligence for their everyday use. It's not limited to the few. It's not limited to, you know, a form factor that's inaccessible. So I actually think that. It's been really neat to see how much that has led to there's lots of different contours of the mission of, of AGI, but benefit humanity means everyone can use it.[01:27:21] Bret: And so I do think like to your point on is cost important. Oh yeah. Cost is really important. How can we have all of humanity access AI if it's incredibly expensive and you need the 200 subscription, which I pay for it. Cause I think, you know, one promote is mind blowing, you know, but it's, you want both cause you need the advanced research.[01:27:41] Bret: You also want everyone in the world to benefit. So that's the way, I mean, if you're trying to predict where we're going to go, just think, what would I do if I were running a company to, you know, go build AGI and ensures it benefits humanity. That's, that's how we prioritize everything.[01:27:57] Alessio: I know we're going to wrap up soon.[01:27:58] Alessio: I would love to ask some personal [01:28:00] questions. One, what are maybe. I've been guiding principles for you one and choosing what to do. So, you know, you were Salesforce. You were CTO of Facebook. I'm sure you got it done a lot more things, but those were the choices that you made. Do you have frameworks that you use for that?[01:28:15] Alessio: Yeah, let's start there.[01:28:16] Bret: I try to remain sort of like present and grounded in the moment. So. No, I wish I, I wish I did it more, but I don't I really try to focus on like impact, I guess, on what I work on, but also do I enjoy it? And sometimes I think, yeah, we talked a little bit about, you know, what should an entrepreneur work on if they want to start a business?[01:28:38] Bret: And I was sort of joking around about sometimes like best businesses are passion projects. I definitely take into account both. Like I, I want to have an impact on the world and I also like, want to enjoy building what I'm building. And I wouldn't work on something that was impactful if I didn't enjoy doing it every day.[01:28:55] Bret: And then I try to have some balance in my life. I've got a [01:29:00] family and one of the values of, of Sierra's competitive intensity, but we also have a value called family. And we always like to say. Intensity and balance are compatible. You can be in a really intense person and I don't have a lot of like hobbies.[01:29:18] Bret: I basically just like work and spend time with my family. But I have balanced there. And but I, but I do try to have that balance just because, you know, if you're proverbially, you know, on your deathbed, what do you, what do you want, and I want to be surrounded by people I love and to be proud of the impact that I had.[01:29:35] Alessio: I know you also love to make handmade pasta. I'm Italian, so I would love to hear favorite pasta shapes, maybe sauces. Oh,[01:29:43] Bret: that's good. I don't know where you found that. Was that deep research or whatever? It was deep research. That's a deep[01:29:48] swyx: cut. Sorry, where is this from?[01:29:50] Alessio: It was from,[01:29:51] swyx: from,[01:29:51] Alessio: I[01:29:51] Bret: forget,[01:29:52] Alessio: it was, it was,[01:29:52] Bret: the source was Ling.[01:29:55] Bret: I do love to cook. So I started making pasta when my [01:30:00] kids were little because I found getting them involved in the kitchen made them eat their meals better. So like participating in the act of making the food. Made them appreciate the food more. And so we do a lot of just like spaghetti linguine, just because it's pretty easy to do.[01:30:15] Bret: And the crank is turning and the part of the pasta making for me was like, they could operate the crank and I could put it through and it was very interactive. Sauces. I do a bunch probably, I mean. I, the like really simple marinara with really good tomatoes and it's like just a classic, especially if you're a really good pasta, but I like them all.[01:30:36] Bret: But I mean, I just, you know, that's probably the go to just cause it's easy. So[01:30:40] Alessio: I just said to us when I saw it come up in the research, I was like, I mean, you have to weigh in as the Italian here. Yeah, I would say so. There's one type of spaghetti you called. I like it. That's kind of like they're almost square.[01:30:51] Alessio: Those are really good. We're like you do a cherry tomato sauce with oil. You can put undo again there. Yeah, we can do a different pockets on [01:31:00] head[01:31:00] swyx: of the Italian Tech Mafia. Very, very good restaurants. I highly recommend going to Italian restaurants with him. Yeah. Okay. So my question would be, how do you keep up on the eye?[01:31:10] swyx: There's so much. going on. Do you have some special news resource that you use that no one else has?[01:31:17] Bret: No, but I most mornings I'll try to sort of like read, kind of check out what's going on on social media, just like any buzz around papers. But the thing that I don't The thing I really like, we have a small research team at Sierra and we'll do sessions on interesting papers then.[01:31:36] Bret: I think that's really nice. And, you know, usually it's someone who like really went deep on a paper and kind of does a, you know, you bring your lunch and just kind of do a readout. And I found that to be the most rewarding just because, you know, I love research, but sometimes, you know, some simple concepts are, you know, surrounded by a lot of ornate language and you're like, let's get a few more, you know, Greek letters in there to make it [01:32:00] seem like we did something smart, you know?[01:32:02] Bret: Sometimes just talking it through conceptually, I can grok the, so what, you know, more easily. And so that's also been interesting as well. And then just conversations, you know, I always try to, when someone says something I'm not familiar with, like I've gotten over the feeling dumb thing. I'm like, I don't know what that is.[01:32:20] Bret: Explain it to me. And, and yes, you can sometimes just find neat techniques, new papers, things like that. It's impossible to keep up that, to be honest with you.[01:32:29] swyx: For sure. I mean, if you're struggling, I mean, imagine the rest of us. But like, you know, you, you have really privileged and special conversations.[01:32:36] swyx: What research directions do you think people should pay attention to just based on the buzz you're hearing internally, or, you know,[01:32:42] Bret: This isn't surprising to you or anyone, but I, I think the I think in general, the reasoning models, but it's interesting because two years ago, you know, the chain of thought reasoning paper was pretty important, you know, and in general, chain of thought has always been a meaningful thing from the [01:33:00] time I think it was a Google paper, right?[01:33:01] Bret: If I'm remembering correctly and Google authors. Yeah. And I think that. It has always been a way to get more robust results, you know, from models. What's just really interesting is the combination of distillation and reasoning is making the relative performance. And I'll say actually performance is an ambiguous word, basically the latency of these reasoning models, more reasonable, because if you think about say GPT 4, which was, I think, a huge step change in intelligence, it was.[01:33:33] Bret: Quite slow and quite expensive for a long time. So it limited the applications. Once you got to 4. 0 and 4. 0 mini, you know, it opened the door to a lot of different applications, both for cost and latency. We know one came out really interesting quality wise, but it's quite slow, quite expensive. So just the limited applications.[01:33:52] Bret: Now I just saw like someone post one of they distilled one of the deep seek models and just made it really [01:34:00] small. And, you know, it's doing these chains of thoughts so fast, you know, it's achieving latency numbers. I think sort of similar to like GPT four back in the day. And now all of a sudden you're like, wow, this is really interesting.[01:34:11] Bret: And I just think. Especially if there's lots of people listening who are like applied AI people, it's basically like price performance quality. And for a long, like for a long time, the market's so young, if you, you really had to pick which quadrant you wanted for the use case and. The idea that we'll be able to get like relatively sophisticated reasoning at like oh, three minutes has been amazing.[01:34:34] Bret: If you haven't tried, it's like the speed of it makes me use it so much more than oh, one, just because oh, one, I'd actually often craft my prompts using for, oh, and then put it into a one just because it was so slow, you know, I just didn't want to like the turnaround time. So I'm just really excited about them.[01:34:50] Bret: I think we're in the early days in the same way with the rapid change from GPT three to three, five to four. And you just saw like. Every, and I think with these reasoning [01:35:00] models, just how we're using sort of inference time compute and the techniques around it, the use cases for it, it feels like we're in that kind of Cambrian explosion of ideas and possibilities.[01:35:11] Bret: So I just think it's really exciting. And and certainly if you look at some of the use cases we're talking about, like coding, these are the exact types of domains where these reasoning models. Do and should have better results. And certainly in our domain, there's just some problems that like thinking through more robustly, which we've always done, but it's just been like, these models are just coming out of the box with a lot more batteries included.[01:35:35] Bret: So I'm super excited about them.[01:35:37] Alessio: Any final call to action? Are you hiring, growing the team? More people should use Sierra, obviously.[01:35:42] Bret: We are growing the team and we're hiring software engineers, agent engineers so send me a note, Bret at Sierra dot AI, we're growing like weed. Our engineering team is exclusively in person in San Francisco, though we do have some kind of forward deployed engineers and, and other offices like [01:36:00] London, so[01:36:00] Alessio: awesome.[01:36:01] Alessio: Thank you so much for the time, Bret.[01:36:03] Bret: Thanks for having me. Get full access to Latent.Space at www.latent.space/subscribe
    --------  
    1:36:19
  • Agent Engineering with Pydantic + Graphs — with Samuel Colvin
    Did you know that adding a simple Code Interpreter took o3 from 9.2% to 32% on FrontierMath? The Latent Space crew is hosting a hack night Feb 11th in San Francisco focused on CodeGen use cases, co-hosted with E2B and Edge AGI; watch E2B’s new workshop and RSVP here!We’re happy to announce that today’s guest Samuel Colvin will be teaching his very first Pydantic AI workshop at the newly announced AI Engineer NYC Workshops day on Feb 22! 25 tickets left.If you’re a Python developer, it’s very likely that you’ve heard of Pydantic. Every month, it’s downloaded >300,000,000 times, making it one of the top 25 PyPi packages. OpenAI uses it in its SDK for structured outputs, it’s at the core of FastAPI, and if you’ve followed our AI Engineer Summit conference, Jason Liu of Instructor has given two great talks about it: “Pydantic is all you need” and “Pydantic is STILL all you need”. Now, Samuel Colvin has raised $17M from Sequoia to turn Pydantic from an open source project to a full stack AI engineer platform with Logfire, their observability platform, and PydanticAI, their new agent framework.Logfire: bringing OTEL to AIOpenTelemetry recently merged Semantic Conventions for LLM workloads which provides standard definitions to track performance like gen_ai.server.time_per_output_token. In Sam’s view at least 80% of new apps being built today have some sort of LLM usage in them, and just like web observability platform got replaced by cloud-first ones in the 2010s, Logfire wants to do the same for AI-first apps. If you’re interested in the technical details, Logfire migrated away from Clickhouse to Datafusion for their backend. We spent some time on the importance of picking open source tools you understand and that you can actually contribute to upstream, rather than the more popular ones; listen in ~43:19 for that part.Agents are the killer app for graphsPydantic AI is their attempt at taking a lot of the learnings that LangChain and the other early LLM frameworks had, and putting Python best practices into it. At an API level, it’s very similar to the other libraries: you can call LLMs, create agents, do function calling, do evals, etc.They define an “Agent” as a container with a system prompt, tools, structured result, and an LLM. Under the hood, each Agent is now a graph of function calls that can orchestrate multi-step LLM interactions. You can start simple, then move toward fully dynamic graph-based control flow if needed.“We were compelled enough by graphs once we got them right that our agent implementation [...] is now actually a graph under the hood.”Why Graphs?* More natural for complex or multi-step AI workflows.* Easy to visualize and debug with mermaid diagrams.* Potential for distributed runs, or “waiting days” between steps in certain flows.In parallel, you see folks like Emil Eifrem of Neo4j talk about GraphRAG as another place where graphs fit really well in the AI stack, so it might be time for more people to take them seriously.Full Video EpisodeLike and subscribe!Chapters* 00:00:00 Introductions* 00:00:24 Origins of Pydantic* 00:05:28 Pydantic's AI moment * 00:08:05 Why build a new agents framework?* 00:10:17 Overview of Pydantic AI* 00:12:33 Becoming a believer in graphs* 00:24:02 God Model vs Compound AI Systems* 00:28:13 Why not build an LLM gateway?* 00:31:39 Programmatic testing vs live evals* 00:35:51 Using OpenTelemetry for AI traces* 00:43:19 Why they don't use Clickhouse* 00:48:34 Competing in the observability space* 00:50:41 Licensing decisions for Pydantic and LogFire* 00:51:48 Building Pydantic.run* 00:55:24 Marimo and the future of Jupyter notebooks* 00:57:44 London's AI sceneShow Notes* Sam Colvin* Pydantic* Pydantic AI* Logfire* Pydantic.run* Zod* E2B* Arize* Langsmith* Marimo* Prefect* GLA (Google Generative Language API)* OpenTelemetry* Jason Liu* Sebastian Ramirez* Bogomil Balkansky* Hood Chatham* Jeremy Howard* Andrew LambTranscriptAlessio [00:00:03]: Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Good morning. And today we're very excited to have Sam Colvin join us from Pydantic AI. Welcome. Sam, I heard that Pydantic is all we need. Is that true?Samuel [00:00:24]: I would say you might need Pydantic AI and Logfire as well, but it gets you a long way, that's for sure.Swyx [00:00:29]: Pydantic almost basically needs no introduction. It's almost 300 million downloads in December. And obviously, in the previous podcasts and discussions we've had with Jason Liu, he's been a big fan and promoter of Pydantic and AI.Samuel [00:00:45]: Yeah, it's weird because obviously I didn't create Pydantic originally for uses in AI, it predates LLMs. But it's like we've been lucky that it's been picked up by that community and used so widely.Swyx [00:00:58]: Actually, maybe we'll hear it. Right from you, what is Pydantic and maybe a little bit of the origin story?Samuel [00:01:04]: The best name for it, which is not quite right, is a validation library. And we get some tension around that name because it doesn't just do validation, it will do coercion by default. We now have strict mode, so you can disable that coercion. But by default, if you say you want an integer field and you get in a string of 1, 2, 3, it will convert it to 123 and a bunch of other sensible conversions. And as you can imagine, the semantics around it. Exactly when you convert and when you don't, it's complicated, but because of that, it's more than just validation. Back in 2017, when I first started it, the different thing it was doing was using type hints to define your schema. That was controversial at the time. It was genuinely disapproved of by some people. I think the success of Pydantic and libraries like FastAPI that build on top of it means that today that's no longer controversial in Python. And indeed, lots of other people have copied that route, but yeah, it's a data validation library. It uses type hints for the for the most part and obviously does all the other stuff you want, like serialization on top of that. But yeah, that's the core.Alessio [00:02:06]: Do you have any fun stories on how JSON schemas ended up being kind of like the structure output standard for LLMs? And were you involved in any of these discussions? Because I know OpenAI was, you know, one of the early adopters. So did they reach out to you? Was there kind of like a structure output console in open source that people were talking about or was it just a random?Samuel [00:02:26]: No, very much not. So I originally. Didn't implement JSON schema inside Pydantic and then Sebastian, Sebastian Ramirez, FastAPI came along and like the first I ever heard of him was over a weekend. I got like 50 emails from him or 50 like emails as he was committing to Pydantic, adding JSON schema long pre version one. So the reason it was added was for OpenAPI, which is obviously closely akin to JSON schema. And then, yeah, I don't know why it was JSON that got picked up and used by OpenAI. It was obviously very convenient for us. That's because it meant that not only can you do the validation, but because Pydantic will generate you the JSON schema, it will it kind of can be one source of source of truth for structured outputs and tools.Swyx [00:03:09]: Before we dive in further on the on the AI side of things, something I'm mildly curious about, obviously, there's Zod in JavaScript land. Every now and then there is a new sort of in vogue validation library that that takes over for quite a few years and then maybe like some something else comes along. Is Pydantic? Is it done like the core Pydantic?Samuel [00:03:30]: I've just come off a call where we were redesigning some of the internal bits. There will be a v3 at some point, which will not break people's code half as much as v2 as in v2 was the was the massive rewrite into Rust, but also fixing all the stuff that was broken back from like version zero point something that we didn't fix in v1 because it was a side project. We have plans to move some of the basically store the data in Rust types after validation. Not completely. So we're still working to design the Pythonic version of it, in order for it to be able to convert into Python types. So then if you were doing like validation and then serialization, you would never have to go via a Python type we reckon that can give us somewhere between three and five times another three to five times speed up. That's probably the biggest thing. Also, like changing how easy it is to basically extend Pydantic and define how particular types, like for example, NumPy arrays are validated and serialized. But there's also stuff going on. And for example, Jitter, the JSON library in Rust that does the JSON parsing, has SIMD implementation at the moment only for AMD64. So we can add that. We need to go and add SIMD for other instruction sets. So there's a bunch more we can do on performance. I don't think we're going to go and revolutionize Pydantic, but it's going to continue to get faster, continue, hopefully, to allow people to do more advanced things. We might add a binary format like CBOR for serialization for when you'll just want to put the data into a database and probably load it again from Pydantic. So there are some things that will come along, but for the most part, it should just get faster and cleaner.Alessio [00:05:04]: From a focus perspective, I guess, as a founder too, how did you think about the AI interest rising? And then how do you kind of prioritize, okay, this is worth going into more, and we'll talk about Pydantic AI and all of that. What was maybe your early experience with LLAMP, and when did you figure out, okay, this is something we should take seriously and focus more resources on it?Samuel [00:05:28]: I'll answer that, but I'll answer what I think is a kind of parallel question, which is Pydantic's weird, because Pydantic existed, obviously, before I was starting a company. I was working on it in my spare time, and then beginning of 22, I started working on the rewrite in Rust. And I worked on it full-time for a year and a half, and then once we started the company, people came and joined. And it was a weird project, because that would never go away. You can't get signed off inside a startup. Like, we're going to go off and three engineers are going to work full-on for a year in Python and Rust, writing like 30,000 lines of Rust just to release open-source-free Python library. The result of that has been excellent for us as a company, right? As in, it's made us remain entirely relevant. And it's like, Pydantic is not just used in the SDKs of all of the AI libraries, but I can't say which one, but one of the big foundational model companies, when they upgraded from Pydantic v1 to v2, their number one internal model... The metric of performance is time to first token. That went down by 20%. So you think about all of the actual AI going on inside, and yet at least 20% of the CPU, or at least the latency inside requests was actually Pydantic, which shows like how widely it's used. So we've benefited from doing that work, although it didn't, it would have never have made financial sense in most companies. In answer to your question about like, how do we prioritize AI, I mean, the honest truth is we've spent a lot of the last year and a half building. Good general purpose observability inside LogFire and making Pydantic good for general purpose use cases. And the AI has kind of come to us. Like we just, not that we want to get away from it, but like the appetite, uh, both in Pydantic and in LogFire to go and build with AI is enormous because it kind of makes sense, right? Like if you're starting a new greenfield project in Python today, what's the chance that you're using GenAI 80%, let's say, globally, obviously it's like a hundred percent in California, but even worldwide, it's probably 80%. Yeah. And so everyone needs that stuff. And there's so much yet to be figured out so much like space to do things better in the ecosystem in a way that like to go and implement a database that's better than Postgres is a like Sisyphean task. Whereas building, uh, tools that are better for GenAI than some of the stuff that's about now is not very difficult. Putting the actual models themselves to one side.Alessio [00:07:40]: And then at the same time, then you released Pydantic AI recently, which is, uh, um, you know, agent framework and early on, I would say everybody like, you know, Langchain and like, uh, Pydantic kind of like a first class support, a lot of these frameworks, we're trying to use you to be better. What was the decision behind we should do our own framework? Were there any design decisions that you disagree with any workloads that you think people didn't support? Well,Samuel [00:08:05]: it wasn't so much like design and workflow, although I think there were some, some things we've done differently. Yeah. I think looking in general at the ecosystem of agent frameworks, the engineering quality is far below that of the rest of the Python ecosystem. There's a bunch of stuff that we have learned how to do over the last 20 years of building Python libraries and writing Python code that seems to be abandoned by people when they build agent frameworks. Now I can kind of respect that, particularly in the very first agent frameworks, like Langchain, where they were literally figuring out how to go and do this stuff. It's completely understandable that you would like basically skip some stuff.Samuel [00:08:42]: I'm shocked by the like quality of some of the agent frameworks that have come out recently from like well-respected names, which it just seems to be opportunism and I have little time for that, but like the early ones, like I think they were just figuring out how to do stuff and just as lots of people have learned from Pydantic, we were able to learn a bit from them. I think from like the gap we saw and the thing we were frustrated by was the production readiness. And that means things like type checking, even if type checking makes it hard. Like Pydantic AI, I will put my hand up now and say it has a lot of generics and you need to, it's probably easier to use it if you've written a bit of Rust and you really understand generics, but like, and that is, we're not claiming that that makes it the easiest thing to use in all cases, we think it makes it good for production applications in big systems where type checking is a no-brainer in Python. But there are also a bunch of stuff we've learned from maintaining Pydantic over the years that we've gone and done. So every single example in Pydantic AI's documentation is run on Python. As part of tests and every single print output within an example is checked during tests. So it will always be up to date. And then a bunch of things that, like I say, are standard best practice within the rest of the Python ecosystem, but I'm not followed surprisingly by some AI libraries like coverage, linting, type checking, et cetera, et cetera, where I think these are no-brainers, but like weirdly they're not followed by some of the other libraries.Alessio [00:10:04]: And can you just give an overview of the framework itself? I think there's kind of like the. LLM calling frameworks, there are the multi-agent frameworks, there's the workflow frameworks, like what does Pydantic AI do?Samuel [00:10:17]: I glaze over a bit when I hear all of the different sorts of frameworks, but I like, and I will tell you when I built Pydantic, when I built Logfire and when I built Pydantic AI, my methodology is not to go and like research and review all of the other things. I kind of work out what I want and I go and build it and then feedback comes and we adjust. So the fundamental building block of Pydantic AI is agents. The exact definition of agents and how you want to define them. is obviously ambiguous and our things are probably sort of agent-lit, not that we would want to go and rename them to agent-lit, but like the point is you probably build them together to build something and most people will call an agent. So an agent in our case has, you know, things like a prompt, like system prompt and some tools and a structured return type if you want it, that covers the vast majority of cases. There are situations where you want to go further and the most complex workflows where you want graphs and I resisted graphs for quite a while. I was sort of of the opinion you didn't need them and you could use standard like Python flow control to do all of that stuff. I had a few arguments with people, but I basically came around to, yeah, I can totally see why graphs are useful. But then we have the problem that by default, they're not type safe because if you have a like add edge method where you give the names of two different edges, there's no type checking, right? Even if you go and do some, I'm not, not all the graph libraries are AI specific. So there's a, there's a graph library called, but it allows, it does like a basic runtime type checking. Ironically using Pydantic to try and make up for the fact that like fundamentally that graphs are not typed type safe. Well, I like Pydantic, but it did, that's not a real solution to have to go and run the code to see if it's safe. There's a reason that starting type checking is so powerful. And so we kind of, from a lot of iteration eventually came up with a system of using normally data classes to define nodes where you return the next node you want to call and where we're able to go and introspect the return type of a node to basically build the graph. And so the graph is. Yeah. Inherently type safe. And once we got that right, I, I wasn't, I'm incredibly excited about graphs. I think there's like masses of use cases for them, both in gen AI and other development, but also software's all going to have interact with gen AI, right? It's going to be like web. There's no longer be like a web department in a company is that there's just like all the developers are building for web building with databases. The same is going to be true for gen AI.Alessio [00:12:33]: Yeah. I see on your docs, you call an agent, a container that contains a system prompt function. Tools, structure, result, dependency type model, and then model settings. Are the graphs in your mind, different agents? Are they different prompts for the same agent? What are like the structures in your mind?Samuel [00:12:52]: So we were compelled enough by graphs once we got them right, that we actually merged the PR this morning. That means our agent implementation without changing its API at all is now actually a graph under the hood as it is built using our graph library. So graphs are basically a lower level tool that allow you to build these complex workflows. Our agents are technically one of the many graphs you could go and build. And we just happened to build that one for you because it's a very common, commonplace one. But obviously there are cases where you need more complex workflows where the current agent assumptions don't work. And that's where you can then go and use graphs to build more complex things.Swyx [00:13:29]: You said you were cynical about graphs. What changed your mind specifically?Samuel [00:13:33]: I guess people kept giving me examples of things that they wanted to use graphs for. And my like, yeah, but you could do that in standard flow control in Python became a like less and less compelling argument to me because I've maintained those systems that end up with like spaghetti code. And I could see the appeal of this like structured way of defining the workflow of my code. And it's really neat that like just from your code, just from your type hints, you can get out a mermaid diagram that defines exactly what can go and happen.Swyx [00:14:00]: Right. Yeah. You do have very neat implementation of sort of inferring the graph from type hints, I guess. Yeah. Is what I would call it. Yeah. I think the question always is I have gone back and forth. I used to work at Temporal where we would actually spend a lot of time complaining about graph based workflow solutions like AWS step functions. And we would actually say that we were better because you could use normal control flow that you already knew and worked with. Yours, I guess, is like a little bit of a nice compromise. Like it looks like normal Pythonic code. But you just have to keep in mind what the type hints actually mean. And that's what we do with the quote unquote magic that the graph construction does.Samuel [00:14:42]: Yeah, exactly. And if you look at the internal logic of actually running a graph, it's incredibly simple. It's basically call a node, get a node back, call that node, get a node back, call that node. If you get an end, you're done. We will add in soon support for, well, basically storage so that you can store the state between each node that's run. And then the idea is you can then distribute the graph and run it across computers. And also, I mean, the other weird, the other bit that's really valuable is across time. Because it's all very well if you look at like lots of the graph examples that like Claude will give you. If it gives you an example, it gives you this lovely enormous mermaid chart of like the workflow, for example, managing returns if you're an e-commerce company. But what you realize is some of those lines are literally one function calls another function. And some of those lines are wait six days for the customer to print their like piece of paper and put it in the post. And if you're writing like your demo. Project or your like proof of concept, that's fine because you can just say, and now we call this function. But when you're building when you're in real in real life, that doesn't work. And now how do we manage that concept to basically be able to start somewhere else in the in our code? Well, this graph implementation makes it incredibly easy because you just pass the node that is the start point for carrying on the graph and it continues to run. So it's things like that where I was like, yeah, I can just imagine how things I've done in the past would be fundamentally easier to understand if we had done them with graphs.Swyx [00:16:07]: You say imagine, but like right now, this pedantic AI actually resume, you know, six days later, like you said, or is this just like a theoretical thing we can go someday?Samuel [00:16:16]: I think it's basically Q&A. So there's an AI that's asking the user a question and effectively you then call the CLI again to continue the conversation. And it basically instantiates the node and calls the graph with that node again. Now, we don't have the logic yet for effectively storing state in the database between individual nodes that we're going to add soon. But like the rest of it is basically there.Swyx [00:16:37]: It does make me think that not only are you competing with Langchain now and obviously Instructor, and now you're going into sort of the more like orchestrated things like Airflow, Prefect, Daxter, those guys.Samuel [00:16:52]: Yeah, I mean, we're good friends with the Prefect guys and Temporal have the same investors as us. And I'm sure that my investor Bogomol would not be too happy if I was like, oh, yeah, by the way, as well as trying to take on Datadog. We're also going off and trying to take on Temporal and everyone else doing that. Obviously, we're not doing all of the infrastructure of deploying that right yet, at least. We're, you know, we're just building a Python library. And like what's crazy about our graph implementation is, sure, there's a bit of magic in like introspecting the return type, you know, extracting things from unions, stuff like that. But like the actual calls, as I say, is literally call a function and get back a thing and call that. It's like incredibly simple and therefore easy to maintain. The question is, how useful is it? Well, I don't know yet. I think we have to go and find out. We have a whole. We've had a slew of people joining our Slack over the last few days and saying, tell me how good Pydantic AI is. How good is Pydantic AI versus Langchain? And I refuse to answer. That's your job to go and find that out. Not mine. We built a thing. I'm compelled by it, but I'm obviously biased. The ecosystem will work out what the useful tools are.Swyx [00:17:52]: Bogomol was my board member when I was at Temporal. And I think I think just generally also having been a workflow engine investor and participant in this space, it's a big space. Like everyone needs different functions. I think the one thing that I would say like yours, you know, as a library, you don't have that much control of it over the infrastructure. I do like the idea that each new agents or whatever or unit of work, whatever you call that should spin up in this sort of isolated boundaries. Whereas yours, I think around everything runs in the same process. But you ideally want to sort of spin out its own little container of things.Samuel [00:18:30]: I agree with you a hundred percent. And we will. It would work now. Right. As in theory, you're just like as long as you can serialize the calls to the next node, you just have to all of the different containers basically have to have the same the same code. I mean, I'm super excited about Cloudflare workers running Python and being able to install dependencies. And if Cloudflare could only give me my invitation to the private beta of that, we would be exploring that right now because I'm super excited about that as a like compute level for some of this stuff where exactly what you're saying, basically. You can run everything as an individual. Like worker function and distribute it. And it's resilient to failure, et cetera, et cetera.Swyx [00:19:08]: And it spins up like a thousand instances simultaneously. You know, you want it to be sort of truly serverless at once. Actually, I know we have some Cloudflare friends who are listening, so hopefully they'll get in front of the line. Especially.Samuel [00:19:19]: I was in Cloudflare's office last week shouting at them about other things that frustrate me. I have a love-hate relationship with Cloudflare. Their tech is awesome. But because I use it the whole time, I then get frustrated. So, yeah, I'm sure I will. I will. I will get there soon.Swyx [00:19:32]: There's a side tangent on Cloudflare. Is Python supported at full? I actually wasn't fully aware of what the status of that thing is.Samuel [00:19:39]: Yeah. So Pyodide, which is Python running inside the browser in scripting, is supported now by Cloudflare. They basically, they're having some struggles working out how to manage, ironically, dependencies that have binaries, in particular, Pydantic. Because these workers where you can have thousands of them on a given metal machine, you don't want to have a difference. You basically want to be able to have a share. Shared memory for all the different Pydantic installations, effectively. That's the thing they work out. They're working out. But Hood, who's my friend, who is the primary maintainer of Pyodide, works for Cloudflare. And that's basically what he's doing, is working out how to get Python running on Cloudflare's network.Swyx [00:20:19]: I mean, the nice thing is that your binary is really written in Rust, right? Yeah. Which also compiles the WebAssembly. Yeah. So maybe there's a way that you'd build... You have just a different build of Pydantic and that ships with whatever your distro for Cloudflare workers is.Samuel [00:20:36]: Yes, that's exactly what... So Pyodide has builds for Pydantic Core and for things like NumPy and basically all of the popular binary libraries. Yeah. It's just basic. And you're doing exactly that, right? You're using Rust to compile the WebAssembly and then you're calling that shared library from Python. And it's unbelievably complicated, but it works. Okay.Swyx [00:20:57]: Staying on graphs a little bit more, and then I wanted to go to some of the other features that you have in Pydantic AI. I see in your docs, there are sort of four levels of agents. There's single agents, there's agent delegation, programmatic agent handoff. That seems to be what OpenAI swarms would be like. And then the last one, graph-based control flow. Would you say that those are sort of the mental hierarchy of how these things go?Samuel [00:21:21]: Yeah, roughly. Okay.Swyx [00:21:22]: You had some expression around OpenAI swarms. Well.Samuel [00:21:25]: And indeed, OpenAI have got in touch with me and basically, maybe I'm not supposed to say this, but basically said that Pydantic AI looks like what swarms would become if it was production ready. So, yeah. I mean, like, yeah, which makes sense. Awesome. Yeah. I mean, in fact, it was specifically saying, how can we give people the same feeling that they were getting from swarms that led us to go and implement graphs? Because my, like, just call the next agent with Python code was not a satisfactory answer to people. So it was like, okay, we've got to go and have a better answer for that. It's not like, let us to get to graphs. Yeah.Swyx [00:21:56]: I mean, it's a minimal viable graph in some sense. What are the shapes of graphs that people should know? So the way that I would phrase this is I think Anthropic did a very good public service and also kind of surprisingly influential blog post, I would say, when they wrote Building Effective Agents. We actually have the authors coming to speak at my conference in New York, which I think you're giving a workshop at. Yeah.Samuel [00:22:24]: I'm trying to work it out. But yes, I think so.Swyx [00:22:26]: Tell me if you're not. yeah, I mean, like, that was the first, I think, authoritative view of, like, what kinds of graphs exist in agents and let's give each of them a name so that everyone is on the same page. So I'm just kind of curious if you have community names or top five patterns of graphs.Samuel [00:22:44]: I don't have top five patterns of graphs. I would love to see what people are building with them. But like, it's been it's only been a couple of weeks. And of course, there's a point is that. Because they're relatively unopinionated about what you can go and do with them. They don't suit them. Like, you can go and do lots of lots of things with them, but they don't have the structure to go and have like specific names as much as perhaps like some other systems do. I think what our agents are, which have a name and I can't remember what it is, but this basically system of like, decide what tool to call, go back to the center, decide what tool to call, go back to the center and then exit. One form of graph, which, as I say, like our agents are effectively one implementation of a graph, which is why under the hood they are now using graphs. And it'll be interesting to see over the next few years whether we end up with these like predefined graph names or graph structures or whether it's just like, yep, I built a graph or whether graphs just turn out not to match people's mental image of what they want and die away. We'll see.Swyx [00:23:38]: I think there is always appeal. Every developer eventually gets graph religion and goes, oh, yeah, everything's a graph. And then they probably over rotate and go go too far into graphs. And then they have to learn a whole bunch of DSLs. And then they're like, actually, I didn't need that. I need this. And they scale back a little bit.Samuel [00:23:55]: I'm at the beginning of that process. I'm currently a graph maximalist, although I haven't actually put any into production yet. But yeah.Swyx [00:24:02]: This has a lot of philosophical connections with other work coming out of UC Berkeley on compounding AI systems. I don't know if you know of or care. This is the Gartner world of things where they need some kind of industry terminology to sell it to enterprises. I don't know if you know about any of that.Samuel [00:24:24]: I haven't. I probably should. I should probably do it because I should probably get better at selling to enterprises. But no, no, I don't. Not right now.Swyx [00:24:29]: This is really the argument is that instead of putting everything in one model, you have more control and more maybe observability to if you break everything out into composing little models and changing them together. And obviously, then you need an orchestration framework to do that. Yeah.Samuel [00:24:47]: And it makes complete sense. And one of the things we've seen with agents is they work well when they work well. But when they. Even if you have the observability through log five that you can see what was going on, if you don't have a nice hook point to say, hang on, this is all gone wrong. You have a relatively blunt instrument of basically erroring when you exceed some kind of limit. But like what you need to be able to do is effectively iterate through these runs so that you can have your own control flow where you're like, OK, we've gone too far. And that's where one of the neat things about our graph implementation is you can basically call next in a loop rather than just running the full graph. And therefore, you have this opportunity to to break out of it. But yeah, basically, it's the same point, which is like if you have two bigger unit of work to some extent, whether or not it involves gen AI. But obviously, it's particularly problematic in gen AI. You only find out afterwards when you've spent quite a lot of time and or money when it's gone off and done done the wrong thing.Swyx [00:25:39]: Oh, drop on this. We're not going to resolve this here, but I'll drop this and then we can move on to the next thing. This is the common way that we we developers talk about this. And then the machine learning researchers look at us. And laugh and say, that's cute. And then they just train a bigger model and they wipe us out in the next training run. So I think there's a certain amount of we are fighting the bitter lesson here. We're fighting AGI. And, you know, when AGI arrives, this will all go away. Obviously, on Latent Space, we don't really discuss that because I think AGI is kind of this hand wavy concept that isn't super relevant. But I think we have to respect that. For example, you could do a chain of thoughts with graphs and you could manually orchestrate a nice little graph that does like. Reflect, think about if you need more, more inference time, compute, you know, that's the hot term now. And then think again and, you know, scale that up. Or you could train Strawberry and DeepSeq R1. Right.Samuel [00:26:32]: I saw someone saying recently, oh, they were really optimistic about agents because models are getting faster exponentially. And I like took a certain amount of self-control not to describe that it wasn't exponential. But my main point was. If models are getting faster as quickly as you say they are, then we don't need agents and we don't really need any of these abstraction layers. We can just give our model and, you know, access to the Internet, cross our fingers and hope for the best. Agents, agent frameworks, graphs, all of this stuff is basically making up for the fact that right now the models are not that clever. In the same way that if you're running a customer service business and you have loads of people sitting answering telephones, the less well trained they are, the less that you trust them, the more that you need to give them a script to go through. Whereas, you know, so if you're running a bank and you have lots of customer service people who you don't trust that much, then you tell them exactly what to say. If you're doing high net worth banking, you just employ people who you think are going to be charming to other rich people and set them off to go and have coffee with people. Right. And the same is true of models. The more intelligent they are, the less we need to tell them, like structure what they go and do and constrain the routes in which they take.Swyx [00:27:42]: Yeah. Yeah. Agree with that. So I'm happy to move on. So the other parts of Pydantic AI that are worth commenting on, and this is like my last rant, I promise. So obviously, every framework needs to do its sort of model adapter layer, which is, oh, you can easily swap from OpenAI to Cloud to Grok. You also have, which I didn't know about, Google GLA, which I didn't really know about until I saw this in your docs, which is generative language API. I assume that's AI Studio? Yes.Samuel [00:28:13]: Google don't have good names for it. So Vertex is very clear. That seems to be the API that like some of the things use, although it returns 503 about 20% of the time. So... Vertex? No. Vertex, fine. But the... Oh, oh. GLA. Yeah. Yeah.Swyx [00:28:28]: I agree with that.Samuel [00:28:29]: So we have, again, another example of like, well, I think we go the extra mile in terms of engineering is we run on every commit, at least commit to main, we run tests against the live models. Not lots of tests, but like a handful of them. Oh, okay. And we had a point last week where, yeah, GLA is a little bit better. GLA1 was failing every single run. One of their tests would fail. And we, I think we might even have commented out that one at the moment. So like all of the models fail more often than you might expect, but like that one seems to be particularly likely to fail. But Vertex is the same API, but much more reliable.Swyx [00:29:01]: My rant here is that, you know, versions of this appear in Langchain and every single framework has to have its own little thing, a version of that. I would put to you, and then, you know, this is, this can be agree to disagree. This is not needed in Pydantic AI. I would much rather you adopt a layer like Lite LLM or what's the other one in JavaScript port key. And that's their job. They focus on that one thing and they, they normalize APIs for you. All new models are automatically added and you don't have to duplicate this inside of your framework. So for example, if I wanted to use deep seek, I'm out of luck because Pydantic AI doesn't have deep seek yet.Samuel [00:29:38]: Yeah, it does.Swyx [00:29:39]: Oh, it does. Okay. I'm sorry. But you know what I mean? Should this live in your code or should it live in a layer that's kind of your API gateway that's a defined piece of infrastructure that people have?Samuel [00:29:49]: And I think if a company who are well known, who are respected by everyone had come along and done this at the right time, maybe we should have done it a year and a half ago and said, we're going to be the universal AI layer. That would have been a credible thing to do. I've heard varying reports of Lite LLM is the truth. And it didn't seem to have exactly the type safety that we needed. Also, as I understand it, and again, I haven't looked into it in great detail. Part of their business model is proxying the request through their, through their own system to do the generalization. That would be an enormous put off to an awful lot of people. Honestly, the truth is I don't think it is that much work unifying the model. I get where you're coming from. I kind of see your point. I think the truth is that everyone is centralizing around open AIs. Open AI's API is the one to do. So DeepSeq support that. Grok with OK support that. Ollama also does it. I mean, if there is that library right now, it's more or less the open AI SDK. And it's very high quality. It's well type checked. It uses Pydantic. So I'm biased. But I mean, I think it's pretty well respected anyway.Swyx [00:30:57]: There's different ways to do this. Because also, it's not just about normalizing the APIs. You have to do secret management and all that stuff.Samuel [00:31:05]: Yeah. And there's also. There's Vertex and Bedrock, which to one extent or another, effectively, they host multiple models, but they don't unify the API. But they do unify the auth, as I understand it. Although we're halfway through doing Bedrock. So I don't know about it that well. But they're kind of weird hybrids because they support multiple models. But like I say, the auth is centralized.Swyx [00:31:28]: Yeah, I'm surprised they don't unify the API. That seems like something that I would do. You know, we can discuss all this all day. There's a lot of APIs. I agree.Samuel [00:31:36]: It would be nice if there was a universal one that we didn't have to go and build.Alessio [00:31:39]: And I guess the other side of, you know, routing model and picking models like evals. How do you actually figure out which one you should be using? I know you have one. First of all, you have very good support for mocking in unit tests, which is something that a lot of other frameworks don't do. So, you know, my favorite Ruby library is VCR because it just, you know, it just lets me store the HTTP requests and replay them. That part I'll kind of skip. I think you are busy like this test model. We're like just through Python. You try and figure out what the model might respond without actually calling the model. And then you have the function model where people can kind of customize outputs. Any other fun stories maybe from there? Or is it just what you see is what you get, so to speak?Samuel [00:32:18]: On those two, I think what you see is what you get. On the evals, I think watch this space. I think it's something that like, again, I was somewhat cynical about for some time. Still have my cynicism about some of the well, it's unfortunate that so many different things are called evals. It would be nice if we could agree. What they are and what they're not. But look, I think it's a really important space. I think it's something that we're going to be working on soon, both in Pydantic AI and in LogFire to try and support better because it's like it's an unsolved problem.Alessio [00:32:45]: Yeah, you do say in your doc that anyone who claims to know for sure exactly how your eval should be defined can safely be ignored.Samuel [00:32:52]: We'll delete that sentence when we tell people how to do their evals.Alessio [00:32:56]: Exactly. I was like, we need we need a snapshot of this today. And so let's talk about eval. So there's kind of like the vibe. Yeah. So you have evals, which is what you do when you're building. Right. Because you cannot really like test it that many times to get statistical significance. And then there's the production eval. So you also have LogFire, which is kind of like your observability product, which I tried before. It's very nice. What are some of the learnings you've had from building an observability tool for LEMPs? And yeah, as people think about evals, even like what are the right things to measure? What are like the right number of samples that you need to actually start making decisions?Samuel [00:33:33]: I'm not the best person to answer that is the truth. So I'm not going to come in here and tell you that I think I know the answer on the exact number. I mean, we can do some back of the envelope statistics calculations to work out that like having 30 probably gets you most of the statistical value of having 200 for, you know, by definition, 15% of the work. But the exact like how many examples do you need? For example, that's a much harder question to answer because it's, you know, it's deep within the how models operate in terms of LogFire. One of the reasons we built LogFire the way we have and we allow you to write SQL directly against your data and we're trying to build the like powerful fundamentals of observability is precisely because we know we don't know the answers. And so allowing people to go and innovate on how they're going to consume that stuff and how they're going to process it is we think that's valuable. Because even if we come along and offer you an evals framework on top of LogFire, it won't be right in all regards. And we want people to be able to go and innovate and being able to write their own SQL connected to the API. And effectively query the data like it's a database with SQL allows people to innovate on that stuff. And that's what allows us to do it as well. I mean, we do a bunch of like testing what's possible by basically writing SQL directly against LogFire as any user could. I think the other the other really interesting bit that's going on in observability is OpenTelemetry is centralizing around semantic attributes for GenAI. So it's a relatively new project. A lot of it's still being added at the moment. But basically the idea that like. They unify how both SDKs and or agent frameworks send observability data to to any OpenTelemetry endpoint. And so, again, we can go and having that unification allows us to go and like basically compare different libraries, compare different models much better. That stuff's in a very like early stage of development. One of the things we're going to be working on pretty soon is basically, I suspect, GenAI will be the first agent framework that implements those semantic attributes properly. Because, again, we control and we can say this is important for observability, whereas most of the other agent frameworks are not maintained by people who are trying to do observability. With the exception of Langchain, where they have the observability platform, but they chose not to go down the OpenTelemetry route. So they're like plowing their own furrow. And, you know, they're a lot they're even further away from standardization.Alessio [00:35:51]: Can you maybe just give a quick overview of how OTEL ties into the AI workflows? There's kind of like the question of is, you know, a trace. And a span like a LLM call. Is it the agent? It's kind of like the broader thing you're tracking. How should people think about it?Samuel [00:36:06]: Yeah, so they have a PR that I think may have now been merged from someone at IBM talking about remote agents and trying to support this concept of remote agents within GenAI. I'm not particularly compelled by that because I don't think that like that's actually by any means the common use case. But like, I suppose it's fine for it to be there. The majority of the stuff in OTEL is basically defining how you would instrument. A given call to an LLM. So basically the actual LLM call, what data you would send to your telemetry provider, how you would structure that. Apart from this slightly odd stuff on remote agents, most of the like agent level consideration is not yet implemented in is not yet decided effectively. And so there's a bit of ambiguity. Obviously, what's good about OTEL is you can in the end send whatever attributes you like. But yeah, there's quite a lot of churn in that space and exactly how we store the data. I think that one of the most interesting things, though, is that if you think about observability. Traditionally, it was sure everyone would say our observability data is very important. We must keep it safe. But actually, companies work very hard to basically not have anything that sensitive in their observability data. So if you're a doctor in a hospital and you search for a drug for an STI, the sequel might be sent to the observability provider. But none of the parameters would. It wouldn't have the patient number or their name or the drug. With GenAI, that distinction doesn't exist because it's all just messed up in the text. If you have that same patient asking an LLM how to. What drug they should take or how to stop smoking. You can't extract the PII and not send it to the observability platform. So the sensitivity of the data that's going to end up in observability platforms is going to be like basically different order of magnitude to what's in what you would normally send to Datadog. Of course, you can make a mistake and send someone's password or their card number to Datadog. But that would be seen as a as a like mistake. Whereas in GenAI, a lot of data is going to be sent. And I think that's why companies like Langsmith and are trying hard to offer observability. On prem, because there's a bunch of companies who are happy for Datadog to be cloud hosted, but want self-hosted self-hosting for this observability stuff with GenAI.Alessio [00:38:09]: And are you doing any of that today? Because I know in each of the spans you have like the number of tokens, you have the context, you're just storing everything. And then you're going to offer kind of like a self-hosting for the platform, basically. Yeah. Yeah.Samuel [00:38:23]: So we have scrubbing roughly equivalent to what the other observability platforms have. So if we, you know, if we see password as the key, we won't send the value. But like, like I said, that doesn't really work in GenAI. So we're accepting we're going to have to store a lot of data and then we'll offer self-hosting for those people who can afford it and who need it.Alessio [00:38:42]: And then this is, I think, the first time that most of the workloads performance is depending on a third party. You know, like if you're looking at Datadog data, usually it's your app that is driving the latency and like the memory usage and all of that. Here you're going to have spans that maybe take a long time to perform because the GLA API is not working or because OpenAI is kind of like overwhelmed. Do you do anything there since like the provider is almost like the same across customers? You know, like, are you trying to surface these things for people and say, hey, this was like a very slow span, but actually all customers using OpenAI right now are seeing the same thing. So maybe don't worry about it or.Samuel [00:39:20]: Not yet. We do a few things that people don't generally do in OTA. So we send. We send information at the beginning. At the beginning of a trace as well as sorry, at the beginning of a span, as well as when it finishes. By default, OTA only sends you data when the span finishes. So if you think about a request which might take like 20 seconds, even if some of the intermediate spans finished earlier, you can't basically place them on the page until you get the top level span. And so if you're using standard OTA, you can't show anything until those requests are finished. When those requests are taking a few hundred milliseconds, it doesn't really matter. But when you're doing Gen AI calls or when you're like running a batch job that might take 30 minutes. That like latency of not being able to see the span is like crippling to understanding your application. And so we've we do a bunch of slightly complex stuff to basically send data about a span as it starts, which is closely related. Yeah.Alessio [00:40:09]: Any thoughts on all the other people trying to build on top of OpenTelemetry in different languages, too? There's like the OpenLEmetry project, which doesn't really roll off the tongue. But how do you see the future of these kind of tools? Is everybody going to have to build? Why does everybody want to build? They want to build their own open source observability thing to then sell?Samuel [00:40:29]: I mean, we are not going off and trying to instrument the likes of the OpenAI SDK with the new semantic attributes, because at some point that's going to happen and it's going to live inside OTEL and we might help with it. But we're a tiny team. We don't have time to go and do all of that work. So OpenLEmetry, like interesting project. But I suspect eventually most of those semantic like that instrumentation of the big of the SDKs will live, like I say, inside the main OpenTelemetry report. I suppose. What happens to the agent frameworks? What data you basically need at the framework level to get the context is kind of unclear. I don't think we know the answer yet. But I mean, I was on the, I guess this is kind of semi-public, because I was on the call with the OpenTelemetry call last week talking about GenAI. And there was someone from Arize talking about the challenges they have trying to get OpenTelemetry data out of Langchain, where it's not like natively implemented. And obviously they're having quite a tough time. And I was realizing, hadn't really realized this before, but how lucky we are to primarily be talking about our own agent framework, where we have the control rather than trying to go and instrument other people's.Swyx [00:41:36]: Sorry, I actually didn't know about this semantic conventions thing. It looks like, yeah, it's merged into main OTel. What should people know about this? I had never heard of it before.Samuel [00:41:45]: Yeah, I think it looks like a great start. I think there's some unknowns around how you send the messages that go back and forth, which is kind of the most important part. It's the most important thing of all. And that is moved out of attributes and into OTel events. OTel events in turn are moving from being on a span to being their own top-level API where you send data. So there's a bunch of churn still going on. I'm impressed by how fast the OTel community is moving on this project. I guess they, like everyone else, get that this is important, and it's something that people are crying out to get instrumentation off. So I'm kind of pleasantly surprised at how fast they're moving, but it makes sense.Swyx [00:42:25]: I'm just kind of browsing through the specification. I can already see that this basically bakes in whatever the previous paradigm was. So now they have genai.usage.prompt tokens and genai.usage.completion tokens. And obviously now we have reasoning tokens as well. And then only one form of sampling, which is top-p. You're basically baking in or sort of reifying things that you think are important today, but it's not a super foolproof way of doing this for the future. Yeah.Samuel [00:42:54]: I mean, that's what's neat about OTel is you can always go and send another attribute and that's fine. It's just there are a bunch that are agreed on. But I would say, you know, to come back to your previous point about whether or not we should be relying on one centralized abstraction layer, this stuff is moving so fast that if you start relying on someone else's standard, you risk basically falling behind because you're relying on someone else to keep things up to date.Swyx [00:43:14]: Or you fall behind because you've got other things going on.Samuel [00:43:17]: Yeah, yeah. That's fair. That's fair.Swyx [00:43:19]: Any other observations just about building LogFire, actually? Let's just talk about this. So you announced LogFire. I was kind of only familiar with LogFire because of your Series A announcement. I actually thought you were making a separate company. I remember some amount of confusion with you when that came out. So to be clear, it's Pydantic LogFire and the company is one company that has kind of two products, an open source thing and an observability thing, correct? Yeah. I was just kind of curious, like any learnings building LogFire? So classic question is, do you use ClickHouse? Is this like the standard persistence layer? Any learnings doing that?Samuel [00:43:54]: We don't use ClickHouse. We started building our database with ClickHouse, moved off ClickHouse onto Timescale, which is a Postgres extension to do analytical databases. Wow. And then moved off Timescale onto DataFusion. And we're basically now building, it's DataFusion, but it's kind of our own database. Bogomil is not entirely happy that we went through three databases before we chose one. I'll say that. But like, we've got to the right one in the end. I think we could have realized that Timescale wasn't right. I think ClickHouse. They both taught us a lot and we're in a great place now. But like, yeah, it's been a real journey on the database in particular.Swyx [00:44:28]: Okay. So, you know, as a database nerd, I have to like double click on this, right? So ClickHouse is supposed to be the ideal backend for anything like this. And then moving from ClickHouse to Timescale is another counterintuitive move that I didn't expect because, you know, Timescale is like an extension on top of Postgres. Not super meant for like high volume logging. But like, yeah, tell us those decisions.Samuel [00:44:50]: So at the time, ClickHouse did not have good support for JSON. I was speaking to someone yesterday and said ClickHouse doesn't have good support for JSON and got roundly stepped on because apparently it does now. So they've obviously gone and built their proper JSON support. But like back when we were trying to use it, I guess a year ago or a bit more than a year ago, everything happened to be a map and maps are a pain to try and do like looking up JSON type data. And obviously all these attributes, everything you're talking about there in terms of the GenAI stuff. You can choose to make them top level columns if you want. But the simplest thing is just to put them all into a big JSON pile. And that was a problem with ClickHouse. Also, ClickHouse had some really ugly edge cases like by default, or at least until I complained about it a lot, ClickHouse thought that two nanoseconds was longer than one second because they compared intervals just by the number, not the unit. And I complained about that a lot. And then they caused it to raise an error and just say you have to have the same unit. Then I complained a bit more. And I think as I understand it now, they have some. They convert between units. But like stuff like that, when all you're looking at is when a lot of what you're doing is comparing the duration of spans was really painful. Also things like you can't subtract two date times to get an interval. You have to use the date sub function. But like the fundamental thing is because we want our end users to write SQL, the like quality of the SQL, how easy it is to write, matters way more to us than if you're building like a platform on top where your developers are going to write the SQL. And once it's written and it's working, you don't mind too much. So I think that's like one of the fundamental differences. The other problem that I have with the ClickHouse and Impact Timescale is that like the ultimate architecture, the like snowflake architecture of binary data in object store queried with some kind of cache from nearby. They both have it, but it's closed sourced and you only get it if you go and use their hosted versions. And so even if we had got through all the problems with Timescale or ClickHouse, we would end up like, you know, they would want to be taking their 80% margin. And then we would be wanting to take that would basically leave us less space for margin. Whereas data fusion. Properly open source, all of that same tooling is open source. And for us as a team of people with a lot of Rust expertise, data fusion, which is implemented in Rust, we can literally dive into it and go and change it. So, for example, I found that there were some slowdowns in data fusion's string comparison kernel for doing like string contains. And it's just Rust code. And I could go and rewrite the string comparison kernel to be faster. Or, for example, data fusion, when we started using it, didn't have JSON support. Obviously, as I've said, it's something we can do. It's something we needed. I was able to go and implement that in a weekend using our JSON parser that we built for Pydantic Core. So it's the fact that like data fusion is like for us the perfect mixture of a toolbox to build a database with, not a database. And we can go and implement stuff on top of it in a way that like if you were trying to do that in Postgres or in ClickHouse. I mean, ClickHouse would be easier because it's C++, relatively modern C++. But like as a team of people who are not C++ experts, that's much scarier than data fusion for us.Swyx [00:47:47]: Yeah, that's a beautiful rant.Alessio [00:47:49]: That's funny. Most people don't think they have agency on these projects. They're kind of like, oh, I should use this or I should use that. They're not really like, what should I pick so that I contribute the most back to it? You know, so but I think you obviously have an open source first mindset. So that makes a lot of sense.Samuel [00:48:05]: I think if we were probably better as a startup, a better startup and faster moving and just like headlong determined to get in front of customers as fast as possible, we should have just started with ClickHouse. I hope that long term we're in a better place for having worked with data fusion. We like we're quite engaged now with the data fusion community. Andrew Lam, who maintains data fusion, is an advisor to us. We're in a really good place now. But yeah, it's definitely slowed us down relative to just like building on ClickHouse and moving as fast as we can.Swyx [00:48:34]: OK, we're about to zoom out and do Pydantic run and all the other stuff. But, you know, my last question on LogFire is really, you know, at some point you run out sort of community goodwill just because like, oh, I use Pydantic. I love Pydantic. I'm going to use LogFire. OK, then you start entering the territory of the Datadogs, the Sentrys and the honeycombs. Yeah. So where are you going to really spike here? What differentiator here?Samuel [00:48:59]: I wasn't writing code in 2001, but I'm assuming that there were people talking about like web observability and then web observability stopped being a thing, not because the web stopped being a thing, but because all observability had to do web. If you were talking to people in 2010 or 2012, they would have talked about cloud observability. Now that's not a term because all observability is cloud first. The same is going to happen to gen AI. And so whether or not you're trying to compete with Datadog or with Arise and Langsmith, you've got to do first class. You've got to do general purpose observability with first class support for AI. And as far as I know, we're the only people really trying to do that. I mean, I think Datadog is starting in that direction. And to be honest, I think Datadog is a much like scarier company to compete with than the AI specific observability platforms. Because in my opinion, and I've also heard this from lots of customers, AI specific observability where you don't see everything else going on in your app is not actually that useful. Our hope is that we can build the first general purpose observability platform with first class support for AI. And that we have this open source heritage of putting developer experience first that other companies haven't done. For all I'm a fan of Datadog and what they've done. If you search Datadog logging Python. And you just try as a like a non-observability expert to get something up and running with Datadog and Python. It's not trivial, right? That's something Sentry have done amazingly well. But like there's enormous space in most of observability to do DX better.Alessio [00:50:27]: Since you mentioned Sentry, I'm curious how you thought about licensing and all of that. Obviously, your MIT license, you don't have any rolling license like Sentry has where you can only use an open source, like the one year old version of it. Was that a hard decision?Samuel [00:50:41]: So to be clear, LogFire is co-sourced. So Pydantic and Pydantic AI are MIT licensed and like properly open source. And then LogFire for now is completely closed source. And in fact, the struggles that Sentry have had with licensing and the like weird pushback the community gives when they take something that's closed source and make it source available just meant that we just avoided that whole subject matter. I think the other way to look at it is like in terms of either headcount or revenue or dollars in the bank. The amount of open source we do as a company is we've got to be open source. We're up there with the most prolific open source companies, like I say, per head. And so we didn't feel like we were morally obligated to make LogFire open source. We have Pydantic. Pydantic is a foundational library in Python. That and now Pydantic AI are our contribution to open source. And then LogFire is like openly for profit, right? As in we're not claiming otherwise. We're not sort of trying to walk a line if it's open source. But really, we want to make it hard to deploy. So you probably want to pay us. We're trying to be straight. That it's to pay for. We could change that at some point in the future, but it's not an immediate plan.Alessio [00:51:48]: All right. So the first one I saw this new I don't know if it's like a product you're building the Pydantic that run, which is a Python browser sandbox. What was the inspiration behind that? We talk a lot about code interpreter for lamps. I'm an investor in a company called E2B, which is a code sandbox as a service for remote execution. Yeah. What's the Pydantic that run story?Samuel [00:52:09]: So Pydantic that run is again completely open source. I have no interest in making it into a product. We just needed a sandbox to be able to demo LogFire in particular, but also Pydantic AI. So it doesn't have it yet, but I'm going to add basically a proxy to OpenAI and the other models so that you can run Pydantic AI in the browser. See how it works. Tweak the prompt, et cetera, et cetera. And we'll have some kind of limit per day of what you can spend on it or like what the spend is. The other thing we wanted to be able to do was to be able to when you log into LogFire. We have quite a lot of drop off of like a lot of people sign up, find it interesting and then don't go and create a project. And my intuition is that they're like, oh, OK, cool. But now I have to go and open up my development environment, create a new project, do something with the right token. I can't be bothered. And then they drop off and they forget to come back. And so we wanted a really nice way of being able to click here and you can run it in the browser and see what it does. As I think happens to all of us, I sort of started seeing if I could do it a week and a half ago. Got something to run. And then ended up, you know, improving it. And suddenly I spent a week on it. But I think it's useful. Yeah.Alessio [00:53:15]: I remember maybe a couple, two, three years ago, there were a couple of companies trying to build in the browser terminals exactly for this. It's like, you know, you go on GitHub, you see a project that is interesting, but now you got to like clone it and run it on your machine. Sometimes it can be sketchy. This is cool, especially since you already make all the docs runnable in your docs. Like you said, you kind of test them. It sounds like you might just have.Samuel [00:53:39]: So, yeah. The thing is that on every example in Pydantic AI, there's a button that basically says run, which takes you into Pydantic.run, has that code there. And depending on how hard we want to push, we can also have it like hooked up to LogFire automatically. So there's a like, hey, just come and join the project. And you can see what that looks like in LogFire.Swyx [00:53:58]: That's super cool.Alessio [00:53:59]: So I think that's one of the biggest personally for me, one of the biggest drop offs from open source projects. It's kind of like do this. And then as long as something as soon as something doesn't work, I just drop off.Swyx [00:54:09]: So it takes some discipline. You know, like there's been very many versions of this that I've been through in my career where you had to extract this code and run it. And it always falls out of date. Often we would have these this concept of transclusion where we have a separate code examples repo that we want to be that and that we pulled into our docs. And it never never really works. It takes a lot of discipline. So kudos to you on this.Samuel [00:54:31]: And it was it was years of maintaining Pydantic and people complaining, hey, that example is out of date now. But eventually we went and built a PyTest example. Which is another the hardest to search for open source project we ever built. Because obviously, as you can imagine, if you search PyTest examples, you get examples of how to use PyTest. But the PyTest examples will basically go through both your code inside your doc strings to look for Python code and through markdown in your docs and extract that code and then run it for you and run linting over it and soon run type checking over it. So and that's how we keep our examples up to date. But now now we have these like hundreds of examples. All of which are runnable and self-contained. Or if they if they refer to the previous example, it's already structured that they have to be able to import the code from the previous example. So why don't we give someone a nice place to just be able to actually run that using OpenAI and see what the output is. Lovely.Alessio [00:55:24]: All right. So that's kind of Pydantic. And the notes here, I just like going through people's X account, not Twitter. So for four years, you've been saying we need a plain text accessor to Jupyter notebooks. Yeah. I think people maybe have gone the other way, which may get even more opinionated, like with X and like all these kind of like notebook companies.Samuel [00:55:46]: Well, yes. So in reply to that, someone replied and said Marimo is that. And sure enough, Marimo is really impressive. And I've subsequently spoken to spoken to the Marimo guys and got to angel invest in their account. I think it's SeedGround. So like Marimo is very cool. It's doing that. And Marimo also notebooks also run in the browser again using Pyodide. In fact, I nearly got there. We didn't build Pydantic.run because we were just going to use Marimo. But my concern was that people would think LogFire was only to be used in notebooks. And I wanted something that like ironically felt more basic, felt more like a terminal so that no one thought it was like just for notebooks. Yeah.Swyx [00:56:22]: There's a lot of notebook haters out there.Samuel [00:56:24]: And indeed, I have very strong opinions about, you know, proper like Jupyter notebooks. This idea that like you have to run the cells in the right order. I mean, a whole bunch of things. It's basically like worse than Excel or similar. Similarly bad to Excel. Oh, so you are a notebook hater that invested in a notebook. I have this rant called notebook, which was like my attempt to build an alternative that is mostly just a rant about the 10 reasons why notebooks are just as bad as Excel. But Marimo et al, the new ones that are text-based, at least solve a whole bunch of those problems.Swyx [00:56:58]: Agree with that. Yes. I was kind of wishing for something like a better notebook. And then I saw Marimo. I was like, oh, yeah, these guys have are ahead of me on this. Yeah. I don't know if I would do the sort of annotation-based thing. Like, you know, a lot of people love the, oh, annotate this function. And it just adds magic. I think similarly to what Jeremy Howard does with his stuff. It seems a little bit too magical still. But hey, it's a big improvement from notebooks. Yeah.Samuel [00:57:23]: Yeah. Great.Alessio [00:57:24]: Just as on the LLM usage, like the IPyMB file, it's just not good to put in LLMs. So just that alone, I think should be okay.Swyx [00:57:36]: It's just not good to put in LLMs.Alessio [00:57:38]: It's really not. They freak out.Samuel [00:57:41]: It's not good to put in Git either. I mean, I freak out.Swyx [00:57:44]: Okay. Well, we will kill IPyMB at some point. Yeah. Any other takes? I was going to ask you just like, broaden out just about the London scene. You know, what's it like building out there, you know, over the pond?Samuel [00:57:56]: I'm an evening person. And the good thing is that I can get up late and then work late because I'm speaking to people in the U.S. a lot of the time. So I got invited just earlier today to some drinks reception.Samuel [00:58:09]: So I'm feeling positive about the U.K. right now on AI. But I think, look, like everywhere that isn't the U.S. and China knows that we're like way behind on AI. I think it's good that the U.K. is like beginning to say, this is an opportunity, not just a risk. I keep being told you should be at more events. You should be like, you know, hanging out with AI people more. My instinct is like, I'd rather sit at my computer and write code. I think that like, is probably a more effective way of getting people's attention. I'm like, I don't know. I mean, like a bit of me thinks I should be sitting on Twitter, not in San Francisco chatting to people. I think it's probably a bit of a mixture and I could probably do with being in the States a bit more. I think I'm going to be over there a bit more this year. But like, there's definitely the risk if you're in somewhere where everyone wants to chat to you about code where you don't write any code. And that's a failure mode.Swyx [00:58:58]: I would say, yeah, definitely for sure. There's a scene and, you know, one way to really fail at this is to just be involved in that scene. And have that eat up your time, but be at the right events and the ones that I'm running are good events, hopefully.Swyx [00:59:16]: What I say is like, use those things to produce high quality content that travels in a different medium than you normally would be able to. Because there's some selectivity, because there's a broad, there's a focused community on that thing. They will discover your work more. It will be highly produced, you know, that's the pitch over there on why at least I do conferences. And then in terms of talking to people, I always think about this, a three strikes rule. So after a while it gets repetitive, but maybe like the first 10, 20 conversations you have about people, if the same stuff is coming up, that is an indication to you that people like want a thing and it helps you prioritize in a more long form way than you can get in shallow interactions online, right? So that in person, eye to eye, like this is my pain at work and you see the pain and you're like, oh, okay. Like if I do this for you. You will love our tool and like, you can't really replace that. It's customer interviews. Really. Yeah.Samuel [01:00:11]: I agree entirely with that. I think that I think there's a, you're, you're right on a lot of that. And I think that like, it's very easy to get distracted by what people are saying on Twitter and LinkedIn.Swyx [01:00:19]: That's another thing.Samuel [01:00:20]: It's pretty hard to correct for which of those people are actually building this stuff in production in like serious companies and which of them are on day four of learning to code. Cause they have equally strident opinions and in like few characters, they, they seem equally valid. But which one's real and which one's not, or which one is from someone who really knows their stuff is, is hard to know.Alessio [01:00:40]: Anything else, Sam? What do you want to get off your chest?Samuel [01:00:43]: Nothing in particular. I think we, I've really enjoyed our conversation. I would say, I think if anyone who is like looked at, at Pydance AI, we know it's not complete yet. We know there's a bunch of things that are missing embeddings, like storage, MCP and tool sets and stuff like that. We're trying to be deliberate and do stuff well. And that involves not being feature complete yet. Like keep coming back and looking in a few months because we're, we're pretty determined to get that. We know that this stuff is like, whether or not you think that AI is going to be the next Excel, the next internet or the next industrial revolution is going to affect all of us enormously. And so as a company, we get that like making Pydantic AI the best agent framework is existential for us.Alessio [01:01:22]: You're also the first series A company I see that has no open roles for now. Every founder that comes in our podcast, the call to action is like, please come work with us.Samuel [01:01:31]: We are not hiring right now. I want to, I would love, uh, bluntly for Logfire to have a bit more commercial traction and a bit more revenue before I, before I hire some more people. It's quite nice having a few years of runway, not a few months of runway. So I'm not in any, any great appetite to go and like destroy that runway overnight by hiring another, another 10 people. Even if like we, the whole team is like rushed off their feet, kind of doing, as you said, like three to four startups at the same time.Alessio [01:01:58]: Awesome, man. Thank you for joining us.Samuel [01:01:59]: Thank you very much. Get full access to Latent.Space at www.latent.space/subscribe
    --------  
    1:04:04

More Technology podcasts

About Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space
Podcast website

Listen to Latent Space: The AI Engineer Podcast, Hard Fork and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v7.9.5 | © 2007-2025 radio.de GmbH
Generated: 3/3/2025 - 10:40:40 PM