Powered by RND
PodcastsScienceInterconnects

Interconnects

Nathan Lambert
Interconnects
Latest episode

Available Episodes

5 of 117
  • Coding as the epicenter of AI progress and the path to general agents
    Coding, due to its breadth of use-cases, is arguably the last tractable, general domain of continued progress for frontier models that most people can interface with. This is a bold claim, so let’s consider some of the other crucial capabilities covered in the discourse of frontier models:* Chat and the quality of prose written by models has leveled off, other than finetuning to user measures such as sycophancy. * Mathematics has incredible results, but very few people directly gain from better theoretical mathematics. * The AIs’ abilities to do novel science are too unproven to be arguable as a target of hillclimbing. Still, coding is a domain where the models are already incredibly useful, and they continue to consistently stack on meaningful improvements. Working daily with AI over the last few years across side projects and as an AI researcher, it has been easy to take these coding abilities for granted because some forms of them have been around for so long. We punt a bug into ChatGPT and it can solve it or autocomplete can tab our way through entire boilerplate. These use-cases sound benign, and haven’t changed much in that description as they have climbed dramatically in capabilities. Punting a niche problem in 1000+ lines of code to GPT-5-Pro or Gemini Deep Think feels like a very fair strategy. They really can sometimes solve problems that a teammate or I were stuck on for hours to days. We’re progressing through this summarized list of capabilities:* Function completion: ~2021, original Github CoPilot (Codex)* Scripting: ~2022, ChatGPT* Building small projects: ~2025, CLI agents* Building complex production codebases, ~2027 (estimate, which will vary by the codebase)Coding is maybe the only domain of AI use where I’ve felt this slow, gradual improvement. Chat quality has been “good enough” since GPT-4, search showed up and has been remarkable since OpenAI’s o3. Through all of these more exciting moments, AIs’ coding abilities have just continued to gradually improve. Now, many of us are starting to learn a new way of working with AI through these new command-line code agents. This is the largest increase in AI coding abilities in the last few years. The problem is the increase isn’t in the same domain where most people are used to working with AI, so the adoption of the progress is far slower. New applications are rapidly building users and existing distribution networks barely apply. The best way to work with them — and I’ll share more examples of what I’ve already built later in this post — is to construct mini projects, whether it’s a new bespoke website or a script. These are fantastic tools for entrepreneurs and researchers who need a way to quickly flesh out an idea. Things that would’ve taken me days to weeks can now be attempted in hours. Within this, the amount of real “looking at the code” that needs to be done is definitely going down. Coding, as an activity done through agents, is having the barriers to entry fully fall down through the same form factor that is giving the act of coding re-found joy.Why I think a lot of people miss these agents is that the way to use the agents is so different from the marketing of incredible evaluation breakthroughs that the models are reaching. The gap between “superhuman coding” announcements and using an agent for mini projects is obviously big. The best way to use the agents is still mundane and requires careful scoping of context. For example, yesterday, on September 17, 2025, OpenAI announced that GPT-5 as part of a model system got a higher score than any human (and Google’s Gemini Deep Think) at the ICPC World Finals, “the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems.” Here’s what an OpenAI researcher said they did:We competed with an ensemble of general-purpose reasoning models; we did not train any model specifically for the ICPC. We had both GPT-5 and an experimental reasoning model generating solutions, and the experimental reasoning model selecting which solutions to submit. GPT-5 answered 11 correctly, and the last (and most difficult problem) was solved by the experimental reasoning model.These competitions often get highlighted because they’re “finite time,” so the system must respond in the same fixed time as a human does, but the amount of compute used by GPT-5 or another model here is likely far higher than any user has access to. This is mostly an indication that further ability, which some people call raw intelligence, can be extracted from the models, but most of that is limited by scaffolding and product when used by the general population.The real story is that these models are delivering increasing value to a growing pool of people.For followers of AI, coding with AI models is the easiest way to feel progress. Now that models are so good at chat, it takes very specialized tasks to test the general knowledge of models, or many of the gains are in getting the right answer faster than GPT-5-Thinking’s meandering path.I’m not an expert software engineer and the huge differences between models, and improvements that the individual models and systems are making, have been incredibly obvious. I’ve said many times how Claude Code (or now Codex) are far better than Cursor Agent, which is in turn far better than Github CoPilot. GitHub CoPilot feels borderline drunk at the wheel. Cursor often feels a little distracted while still being smart, but Claude Code and Codex seem on topic and able to test the best of a model’s intelligence on the problem at hand. Yes, even the best agents often aren’t good enough in complex codebases, but it removes the need to go back and forth countless times in a chat window to see if a model can reach the end of the puzzle for you. These CLI agents can run tests, fix git problems, run local tools, whatever. The scope is constantly growing.For the nuanced take of Claude Code vs Codex CLI right now, the answer is expensive. The best has been Claude Code forcing Claude Opus 4.1, but Codex is not far behind and comes in at a much cheaper entry point ($20/month) — Opus requires a $100+/month plan. Codex also has nice features like web search, but it hasn’t been a major differentiator yet in my use. The new workflow is to switch to the other agent when one cannot solve the current problem, and let it see the repository with fresh eyes, much like you pasted a question to another chatbot. The agents are just one tab away, just like the competitors for chat. Interconnects is a reader-supported publication. Consider becoming a subscriber.In the comparison of Claude, Cursor, and CoPilot above, the crucial component is that all of these agents can be tested with the same Claude 4 Sonnet model. The gaps are just as wide as I stated, highlighting how so many of the gains in coding agents are just in product implementations. A second version is slightly embarrassing for me, but follows as I hadn’t updated my OpenAI Codex code when trying the new GPT-5-Codex model, which resulted in an immediate massive jump in performance by changing it. It’s a new phenomenon to have a domain at the cutting edge of AI abilities where the software scaffolding of a model is felt so strongly. Product and prompts matter more than ever and this sensation will expand to more domains. The why of these performance differences — even when using the same model — is worth dwelling on. It’s unlikely that the Claude team is that much better at general software engineering and product design — rather, Anthropic has extensive in-house experience in extracting the most from models. The current shift in models has been about how to take a set of models that are designed for question answering and other single-stream text tasks and break down problems. In my taxonomy on next-generation reasoning models, I called this ability “abstraction.” The need to just slightly shift the model to this task explains OpenAI’s recent specialized model for this, GPT-5-Codex. GPT-5 was primarily a release about balancing OpenAI’s books with a user base approaching 1B active users in the chat format. GPT-5 is a honed tool for a different job. The evaluation scores are slightly better than the general reasoning model for this new GPT-5-Codex, but the main gains are in how behavior is different in coding tasks.GPT‑5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task. The model combines two essential skills for a coding agent: pairing with developers in interactive sessions, and persistent, independent execution on longer tasks. That means Codex will feel snappier on small, well-defined requests or while you are chatting with it, and will work for longer on complex tasks like big refactors. During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation.And they included this somewhat confusing plot to showcase this dynamic. I’ve certainly felt these changes when I updated the Codex software and the Codex model.This represents another key problem I presented in my taxonomy — calibration, i.e. not overthinking. Having specialized models and specialized products for a use case could make people think that they’re narrowing in to make progress, but in OpenAI’s case it is rather that their hands are tied financially to support the main ChatGPT application. Claude has already fully committed to code. This is due to the size that the space could expand into.These “coding” agents are definitely going to be seen as doing far more than writing code. Yes, their primary ability is going to be writing the code itself and executing it, but what that enables is an entirely new way of working with your computer. In my post Contra Dwarkesh on Continual Learning, I presented a view where agents are going to be given all your digital working context in order to be a research or editorial assistant available 24/7. I’ve begun putting this to use for Interconnects, where I give the agents all of my articles, metadata, interviews, and details, so I can ask them for relevant references and context for future posts. This is very underbaked and early as a project for searching efficiently over my 400K tokens of writing, but I was prompting it a few times to see any interesting references for this post, and it got me something that was useful! This quote from my Ross Taylor interview was spot on for the vibes of using coding agents in July:My main worry with Claude Code is that... people confuse agents making you more productive versus preventing you from exerting mental effort. So sometimes I’ll have a day with Claude Code where I feel like I use very little mental effort—and it feels amazing—but I’m pretty sure I’ve done less work... Where it becomes really bad is when the file size becomes too long. Then the agent tends to struggle and get into these weird line search doom loops.This sentiment is still definitely true for production codebases that are extremely complex, but the doom loop likelihood is dropping in my tests. At the same time, the joy and mental ease still applies.Some examples of what I’ve built with a mix of Claude Code or OpenAI’s Codex CLI recently include:* A raw HTML site for my RLHF book for comparing the responses of SFT vs. RLHF trained models from the same lineage (and improvements to RLHF book itself).* Making a repository with all of the posts and content from Interconnects so I can use coding agents as editorial assistants while writing.* Improvements to the ATOM Project website.* Stripping my personal website out of Webflow’s systems (which was a mistake to sign up for during graduate school), including CMS entries and other detailed pages.* Other small scripts and tools in my day job training models.It’s not just me building extensively with these. There are multiple open-source projects committed to tracking the public contributions of these models — two are PRArena and Agents in the Wild.PRArena’s dashboard shows over a million PRs getting merged from the Codex web agent, dwarfing many of the competitors. This is the power that OpenAI can wield with distribution, even if the web app version of Codex is far from the zeitgeist that is CLI agents today.This comes with a notable asterisk in methodology that can explain many of the gaps in similar dashboards:Some agents like Codex iterate privately and create ready PRs directly, resulting in very few drafts but high merge rates. Others like Copilot and Codegen create draft PRs first, encouraging public iteration before marking them ready for review.The statistics below focus on Ready PRs only to fairly compare agents across different workflows, measuring each agent's ability to produce mergeable code regardless of whether they iterate publicly (with drafts) or privately.The other dashboard, Agents in the Wild, shows that OpenAI’s coding agent is only one order of magnitude behind humans and other automations in PRs merged.Putting this in perspective relative to Gemini or Claude:The context with this is that Claude Code is far more downloaded than OpenAI’s CLI agent Codex, but it doesn’t name its PRs the same clever way by default with the agent name in the branch. Claude Code has over 20X the downloads of Codex in the last week on NPM.Despite the challenges of measurement, it’s clear that coding agents are taking off. The Codex PRs above actually represent the web agent, which has the default branch name behavior, not the CLI agent. This shows the might of OpenAI’s distribution, and it is impressive how many of the PRs are actually merged (over 80%), when thousands of people are trying a new tool for the first time. The primary difference between the web agent and the CLI agent is a reduction in interactivity. The CLI agents propose a plan and ask for feedback, or let you monitor and interrupt. Codex on the web wraps a similar behavior as the CLI agents in one system that runs all the way until it can open a PR.Over time coding is only going to get more asynchronous and OpenAI is poised to capture this transition if it happens soon. Based on all the above evidence of coding models getting more capable, the move to this new UX for software will happen faster than people expect. The transition to fully autonomous coding will happen soon for types of work where coding models already work near flawlessly — scripts, websites, data analysis, etc. Later, complex production codebases will work best at lower levels of the stack — IDEs, CLI agents, and other things that are both interactive and best for absorbing content.Within a few years, the two trends will converge where autonomous agents are functional and the most complex codebases can be improved with AI. Then everything can return to the chatbot window — you only need to open your IDE when you want to understand what’s going on. For most people, not having to look at the code will be a welcome change.Progress in coding feels slower than the “emergent” abilities between model generations past, which makes it easier to keep track of. This is due to how big the range in behaviors that encompass “coding” is, but results in a fantastic area for learning how AI models evolve and iterate. This playbook will be used many times over by frontier labs in the coming years as AI models are taught to solve more challenging tasks.There’s a quiet revolution happening, and in order to truly understand it, you need to partake. Go build something. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    16:18
  • On China's open source AI trajectory
    Hello everyone! I’m coming back online after two weeks of vacation. Thankfully it coincided with some of the slowest weeks of the year in the AI space. I’m excited to get back to writing and (soon) share projects that’ll wrap up in the last months of the year.It seemed like a good time to remind people of the full set of housekeeping for Interconnects. * Many people love the audio version of the essays (read by me, not AI). You can get them in your podcast player here. Paid subscribers can add private podcast feeds under “manage your subscription” where voiceover is available for paywalled posts.* The Interconnects Discord for paid subscribers continues to get better, and is potentially the leading paid perk amid the fragmentation of Twitter etc.* We’re going to be rolling out more perks for group subscriptions and experimental products this fall. Stay tuned, or get in touch if group discounts are super exciting for your company. For the time being, I’m planning trips and meetups across a few conferences in October. I’ll be speaking at The Curve (Oct. 3-5, Berkeley), COLM (Oct. 7-10, Montreal, interest form), and the PyTorch Conference (Oct. 21-24, SF) on open models, Olmo, and the ATOM Project, so stay tuned for meetups and community opportunities. On to the post!China is maneuvering to double down on its open AI ecosystem. Depending on how the U.S. and its allies change culture and mobilize investment, this could make the dominance of Chinese AI models this summer, from Qwen, Kimi, Z.ai, and DeepSeek, looks like foreshadowing rather than the maximum gap in open models between the U.S. and China. Until the DeepSeek moment, AI was likely a fringe issue to the PRC Government. The central government will set guidelines, rules, budgets, and focus areas that will be distributed and enforced across the decentralized government power structures. AI wasn’t a political focus and the strategy of open-source was likely set by companies looking to close the gap with leading American competitors and achieve maximum market share in the minimum time. I hear all the time that most companies in the U.S. want to start with open models for IT and philosophical reasons, even when spinning up access to a new API model is almost effortless, and it’s likely this bias could be even higher internationally where spending on technology services is historically lower.Most American startups are starting with Chinese models. I’ve been saying this for a while, but a more official reference for this comes from a recent quote from an a16z partner, Martin Casado, another vocal advocate of investment in open models in America. He was quoted in The Economist with regards to his venture portfolio companies:“I’d say 80% chance [they are] using a Chinese open-source model.”The crucial question for the next few years in the geopolitical evolution of AI is whether China will double down on this open-source strategy or change course. The difficulty with monitoring this position is that it could look like nothing is happening and China maintains its outputs, even when the processes for creating them are far different. Holding a position is still a decision.It’s feasible in the next decade that AI applications and open models are approached with the same vigor that China built public infrastructure over the last few decades (Yes, I’m reading Dan Wang’s new book Breakneck). It could become a new area that local officials compete in to prove their worth to the nation — I’m not sure even true China experts could make confident predictions here. A large source of uncertainty is whether the sort of top-down, PRC edicts can result in effective AI models and digital systems, where government officials succeeded in the past with physical infrastructure.At the same time as obvious pro-AI messaging, Chinese officials have warned of “disorderly competition” in the AI space, which is an indirect signal that could keep model providers releasing their models openly. Open models reduce duplicative costs of training, help the entire ecosystem monitor best practices, and force business models that aren’t reliant on simple race-to-the-bottom inference markets. Open model submarkets are emerging for every corner of the AI ecosystem, such as video generation or robotic action models, (see our coverage of open models, Artifacts Logs) with a dramatic evolution from research ideas to mature, stable models in the last 12-18 months.China improving the open model ecosystem looks like the forced adoption of Chinese AI chips, further specialization of companies’ open models to evolving niches, and expanded influence on fundamental AI research shared internationally. All of these directions have early signs of occurring.If the PRC Government wanted to exert certain types of control on the AI ecosystem — they could. This Doug Guthrie excerpt from Apple in China tells the story from the perspective of international companies. Guthrie was a major player in advising on culture changes in Cupertino to better adapt Apple’s strategy to the Chinese market.“When you stake your life, your identity, on and around certain ideas, you sort of fight for them,” Guthrie says. “Xi Jinping kind of broke my heart… I was sitting there, in China, in my dream job, and I’m watching Xinjiang’s internment camps. I’m watching China tearing up a fifty-year agreement over Hong Kong.”Apple, meanwhile, had become too intertwined with China. Guthrie had been hired to help understand the country and to navigate it. And Apple had followed through—very successfully. But it had burned so many boats, as the saying goes, that Guthrie felt its fate was married to China’s and there was no way out. “The cost of doing business in China today is a high one, and it is paid by any and every company that comes looking to tap into its markets or leverage its workforce,” he later wrote in a blog. “Quite simply, you don’t get to do business in China today without doing exactly what the Chinese government wants you to do. Period. No one is immune. No one.”China famously cracked down on its largest technology companies in late 2020, stripping key figures of power and dramatic amounts of market value off the books. AI is not immune to this.The primary read here is that the PRC leadership will decide on the role they want to have in the open-source AI ecosystem. The safe assumption has been that it would continue because the government picked up a high-impact national strategy when it first started focusing on the issue, already seeded with international influence. To formalize these intentions, the Chinese government has recently enacted an “AI+” plan that reads very similarly to the recent White House AI Action Plan when it comes to open models. The AI+ plan idea was first proposed in March 2024 and was just approved in its full text on July 31, 2025. The AI+ plan, when enacted by local officials, lays out goals for the AI industry in how many open models to have at different tiers of performance and some funding mechanisms for nurturing them. This is right in line with other comments from party officials. Chinese Premier Li Qiang, second-ranking member of the Politburo Standing Committee, made comments in March directly supporting open-source models. From the Wall Street Journal:Li pledged that China would boost support for applications of large-scale AI models and AI hardware, such as smartphones, robots, and smart cars.China’s top economic planning body also said Wednesday that the country aimed to develop a system of open-source models while continuing to invest in computing power and data for AI.An excerpt from Beijing’s city plan as part of the overall AI+ initiative, translated by GPT-5 Pro, has interesting, specific goals:By end-2025: implement 5 benchmark application projects at a world-leading level; organize 10 demonstration application projects that lead the nation; and promote a batch of commercializable results. Strive to form 3–5 advanced, usable, and self-controllable base large-model products, 100 excellent industry large-model products, and 1,000 industry success cases. Take the lead in building an AI-native city, making Beijing a globally influential AI innovation source and application high ground.The goal of this is to:Encourage open-source, high-parameter, ‘autonomous and controllable’ base foundation models, and support building cloud hosting platforms for models and datasets to facilitate developer sharing and collaboration.Beyond the minor translation bumpiness, the intentions are clear. The goal of the A+ plan is clear with multiple mentions of both open-source models and an open ecosystem with them where the models can be adopted widely. The ecosystem of models can make the impact of any one individual model greater than it would be alone.The Chinese government having centralized power has more direct levers to enact change than the White House, but this comes with the same trade-offs as all initiatives face when comparing the U.S. vs. China’s potential. I won’t review all of the differences in the approaches here.Where the Chinese Government enacts top level edicts that’ll be harder to follow from the West, there are numerous anecdotes and interactions that highlight in plain terms the mood of the AI ecosystem in China. I’ve routinely been impressed by the level of direct engagement I have received from leading Chinese AI companies and news outlets. Interconnects’ readership has grown substantially in China.Chinese companies are very sensitive to how their open contributions are viewed — highlighting great pride in both their work and approach. The latest case was via our China open model rankings that got direct engagement from multiple Chinese AI labs and was highlighted by a prominent AI news outlet in China — 机器之心/Synced. They described Interconnects as a “high-quality content platform deeply focused on frontier AI research.” (This Synced post was translated and discussed in the latest ChinaAI Newsletter)When intellectuals, influencers, and analysts I follow talk directly to technical members of the AI workforce in China, they sound like what we would expect — people who want to build a great technology. Jasmine Sun had a great writeup on her trip that had some anecdotes on AI in China. She asked “Do you guys worry about AI safety?”“We don’t think about risks at all.” …Continuing from Jasmine:This was the first of several conversations that gave us a distinct impression of the Chinese tech community. Spirits are high, and decoupling policies like export controls only fuel their patriotic drive.At the same time, America still represents a covetable life, despite the current political tumult:To be clear, our researcher friend made clear that working at a top US AI lab was still the most desirable option.In so many ways, trying to precisely map China’s next steps in AI is extremely challenging. Can they convert their lead in energy infrastructure to more total AI compute? Can they build their own AI chips? Will they take the frontier of performance with their talented population and a different approach? All of this is up for debate. The intrigue here is exemplified by the abundant interest in sparse news stories on how DeepSeek is training some AI model with Huawei chips. In many ways, these new chips working would be a bigger story than the original DeepSeek model, but all signs point to expected experiments with domestic chips, where China’s leading AI models are likely to be trained on Nvidia and other Western chips for the foreseeable future. I do not expect DeepSeek R2 to be trained on Huawei’s hardware.China’s hardware investment will take a lot longer to play out than open model strategies, but if China pulls it off — along with its other investments, such as self-driving cars and robots — their practical lead in AI could come for more areas. Open models could be China’s beachhead in a bigger technological resurgence with AI.Without major changes to Western investment in open models, we’re approaching a status quo in 2026 and beyond where:* Chinese open models would continue to increase their lead in performance (and adoption) over American counterparts. This will manifest in many ways. One example is how startups in Silicon Valley built on stronger Chinese models will be offering services that compete with entrenched, handicapped Fortune 500 companies wary of adopting these models in their services. This could make some subareas of AI disruption feel particularly intense. * The Chinese open ecosystem’s density of knowledge and sharing would translate into increased scientific and academic impact. China’s share of conference papers at leading AI conferences is already rapidly on the rise, and having an ecosystem built around substantially better models than their Western counterparts could lead this numerous research growing also to be impactful. Better base models allow more interesting RL and agentic research today, and the list of areas reliant on high-performance models is likely to only grow longer with time.* A proliferation of strong open models would make it difficult to restrict the presence or availability of many forms of AI. We do not have the government tools, incentives, nor culture to successfully prevent digital goods from China (or elsewhere) entering the U.S. economy. Many forms of AI governance and regulation in the United States and the rest of the world may need to be reconsidered, where many jurisdictions have looked to control and understand the development of “frontier AI.” Regulation needs to be approached for a world enmeshed in powerful AI models, rather than trying to control access or the releases of a few.These realities all paint a clear picture that bends the association of open models from “soft power” to just “power.” Continuously releasing strong open AI models could allow Chinese companies to shape the technology interfaces, services and reality around the world. Where 2024 was about research on open models, and 2025 the professionalization of them, 2026 could be where we begin to see clear impacts of their power through endless distribution. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    13:37
  • Ranking the Chinese Open Model Builders
    The Chinese AI ecosystem has taken the AI world by storm this summer with an unrelenting pace of stellar open model releases. The flagship releases that got the most Western media coverage are the likes of Qwen 3, Kimi K2, or Zhipu GLM 4.5, but there is a long-tail of providers close behind in both quality and cadence of releases.In this post we rank the top 19 Chinese labs by the quality and quantity of contributions to the open AI ecosystem — this is not a list of raw ability, but outputs — all the way from the top of DeepSeek to the emerging open research labs. For a more detailed coverage of all the specific models, we recommend studying our Artifacts Log series, which chronicles all of the major open model releases every month. We plan to revisit this ranking and make note of major new players, so make sure to subscribe.At the frontierThese companies rival Western counterparts with the quality and frequency of their models.DeepSeekdeepseek.com | 🤗 deepseek-ai | X @DeepSeek_AIDeepSeek needs little introduction. Their V3 and R1 models, and their impact, are still likely the biggest AI stories of 2025 — open, Chinese models at the frontier of performance with permissive licenses and the exposed model chains of thought that enamored users around the world.With all the attention following the breakthrough releases, a bit more has been said about DeepSeek in terms of operations, ideology, and business model relative to the other labs. They are very innovative technically and have not devoted extensive resources to their consumer chatbot or API hosting (as judged by higher than industry-standard performance degradation).Over the last 18 months, DeepSeek was known for making “about one major release a month.” Since the updated releases of V3-0324 and R1-0528, many close observers have been surprised by their lack of contributions. This has let other players in the ecosystem close the gap, but in terms of impact and actual commercial usage, DeepSeek is still king.An important aspect of DeepSeek’s strategy is their focus on improving their core models at the frontier of performance. To complement this, they have experiments using their current generation to make fundamental research innovations, such as theorem proving or math models, which ultimately get used for the next iteration of models. This is similar to how Western labs operate. First, you test a new idea as an experiment internally, then you fold it into the “main product” that most of your users see.DeepSeekMath, for example, used DeepSeek-Coder-Base-v1.5 7B and introduced the now famous reinforcement learning algorithm Group Relative Policy Optimization (GRPO), which is one of the main drivers of R1. The exception to this (at least today) is Janus, their omni-modal series, which has not been used in their main line.Qwenqwenlm.ai | 🤗 Qwen | X @Alibaba_QwenTongyi Qianwen, the primary AI lab within Alibaba’s cloud division, is by far and away most known for their open language model series. They have been releasing many models across a range of sizes (quite similar to Llama 1 through 3) for years. Recently, their models from Qwen 2.5 and Qwen 3 have had accelerating market share among AI research and startup development.Qwen is closer to American Big Tech companies than to other Chinese AI labs in terms of releases: They are covering the entire stack, from VLMs to embedding models, coding models, image and video generation, and so on.They also cater to all possible customers (or rather every part of the open community) by releasing capable models of all sizes. Small dense models are important for academia to run experiments and for small/medium businesses to power their applications, so it comes to no surprise that Qwen-based models are exploding in popularity.On top of model releases for everyone, they also focused on supporting the (Western) community, releasing MLX and GGUF versions of their models for local usage or a CLI for their coding models, which includes a generous amount of free requests.Unlike some American companies, the core team seems to have stayed relatively small in terms of headcount, in line with other Chinese AI labs: Qwen3 has 177 contributors, whereas Llama 3 has thrice the amount, while Gemini 2.5 has over 3,000 people as part of the model. Close competitorsThese companies have recently arrived at the frontier of performance and we will see if they have the capability to consistently release great models at a pace matching Qwen or DeepSeek.Moonshot AI (Kimi)moonshot.cn | 🤗 moonshotai | X @Kimi_MoonshotMoonshot AI is one of the so-called “AI tigers”, a group of hot Chinese AI startups determined by Chinese media and investors. This group consists of Baichuan, Zhipu AI, Moonshot AI, MiniMax, StepFun, and 01.AI — most of which have attracted investments by tech funds and other tech grants. For example, Alibaba is seen as a big winner in the AI space by having their own models and by being a lead investor in Moonshot, sort of like how big tech companies in the U.S. are investing in fundraising rounds for newer AI labs.While their first models, K1 and K1.5, were closed and available on their API, they started releasing open models after the R1 release with experimental models using the Muon optimizer. Similar to DeepSeek, they focus on a single model line, with small experiments eventually feeding back into the main model. K2 is their “moonshot run,” a.k.a. yolo run, and quickly became a hit similar to R1 (see our report from the release).Further reading on Kimi can be found on ChinaTalk.Zhipu / Z.AIz.ai | 🤗 zai-org | X @Zai_orgZhipu, known in the west as Z.ai, is a startup spinoff of Tsinghua University with considerable investments by Chinese companies and VCs. Currently, they are even considering an IPO, which would make them the first AI tiger to do so.In terms of models, they are mostly known for their recent release of GLM-4.5 and GLM-4.5V, which are all very capable for their sizes (both of which are fairly large mixture of expert models). However, they are not just releasing LLMs, but also image and video generation models, setting them apart from pure-LLM companies and labs.NoteworthyThese companies are transitioning to open releases, have open models with inferior capabilities, or slightly different foci than the text-centric labs pushing the frontiers of intelligence.StepFunstepfun.ai | 🤗 stepfun-ai | X @StepFun_aiStepFun first started as a closed model provider, but pivoted to open model releases after DeepSeek R1 shook up the industry. They are mostly focusing on multi-modal model releases, with Step3 being their flagship VLM. They also have image, audio and video generation models.Tencent (Hunyuan)hunyuan.tencent.com | 🤗 Tencent | X @TencentHunyuanHunyuan is mostly known for HunyuanVideo and Hunyuan3D. While they have released three series of different LLMs, their releases come with very strict licenses, which is unusual for Chinese companies and dampens excitement when combined with performance levels that can be found elsewhere.RedNote (Xiaohongshu)xiaohongshu.com | 🤗 rednote-hilabThe Chinese version of Instagram, RedNote, recently joined the ranks of Chinese companies releasing open models. Especially their capable character recognition / OCR model surprised many (see our coverage). Similar to Xiaomi and Baidu, it remains to be seen what their overall open strategy will be in the near and distant future and they have not competed in the large, frontier model space.MiniMaxminimaxi.com | 🤗 MiniMaxAI | X @MiniMax__AIMiniMax is another of the AI tigers and also started as a closed company. After the release of R1, they changed their strategy and released the weights of Minimax-Text-01, following up with reasoning models building upon it. The unique selling point of these models are the 1M context window achieved with hybrid attention.These text models are not the only thing they are focusing on — they also have image and video generation models, but those remain closed and only available on their API. They are also promoting their consumer platform heavily as they eye an IPO.OpenGVLab / InternLMinternlm.intern-ai.org.cn | 🤗 InternLM | X @opengvlabInternLM & OpenGVLab have deep ties to the Shanghai AI Laboratory, with InternLM focusing on the language models, while OpenGVLab releases vision models. While they release a range of models such as S1 or InternLM-Math, the orgs are mostly known for the strong InternVL series. While the first versions mostly used their own InternLM pretrained models, later releases (such as InternVL3) rely on Qwen as the language backend. Skyworkskywork.ai | 🤗 Skywork | X @Skywork_AIThe Singaporean Skywork first started out as an online karaoke company (yes, really) before they pivoted to AI and being a competitor to Manus, with their platform focusing on agents for work-related tasks, such as slide generation.Their LLM journey started with them releasing their own pretrained dense and MoE models. However, they stopped pre-training their own models and instead started to fine-tune existing models: Their OR1 reasoning model builds on top of DeepSeek-R1-Distill-Qwen-32B, R1V3 uses InternVL3 (which itself uses Qwen2.5 as its LLM backend).Aside from LLMs, they have a wide range of other models, from world models, image and video generation models, and reward models. Similar to their LLMs, they mostly build on top of other models. Unlike many labs, Skywork has released some datasets with their models, such as preference and reasoning training data.On the riseThese companies are either just getting their toes wet with open models or operating as more of academic research organizations than labs pushing the performance of models.ByteDance Seedseed.bytedance.com | 🤗 ByteDance-SeedSeed is the R&D arm of ByteDance and eerily similar to Meta’s FAIR division: Diverse models with interesting research, with their papers garnering a ton of attention in the community. However, it remains to be seen whether they shoot for a Llama-style model release or continue to release research artifacts.Here are some recent papers:* Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference* Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving* Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters* Seedance 1.0: Exploring the Boundaries of Video Generation Models* SeedEdit 3.0: Fast and High-Quality Generative Image Editing* Seed1.5‑VL Technical Report* Mogao: An Omni Foundation Model for Interleaved Multi‑Modal Generation* Seed1.5‑Thinking: Advancing Superb Reasoning Models with Reinforcement Learning* VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks* Seed LiveInterpret 2.0: End‑to‑end Simultaneous Speech‑to‑speech Translation with Your VoiceOpenBMBopenbmb.ai | 🤗 openbmb | X @OpenBMBOpenBMB is an open-source community (comparable to BigScience) from Tsinghua University NLP Lab (the very same university where Zhipu was spun off from) with support from the Beijing Academy of Artificial Intelligence (BAAI) and ModelBest.They are mostly focusing on small multi-modal models for the edge, such as MiniCPM-V-4. However, the license is rather restrictive, which is surprising given the community-driven origins of the group. Aside from model releases, they also release frameworks and specialized kernels to make sure their models run on low-end hardware.Xiaomi (MiMo)mi.com | 🤗 XiaomiMiMoXiaomi started releasing a bunch of small, capable models, ranging from LLMs to VLMs and audio models. Xiaomi updating the models quickly after an initial launch and releasing multiple variants of the models show that it is not a one-off foray into open models. However, it remains to be seen whether those are mostly research artifacts or whether they are serious about potentially pushing the frontier or competing for adoption.Baidu (ERNIE)yiyan.baidu.com | 🤗 baidu | X @Baidu_IncBaidu, one of the original names in the Chinese AI space, has only released the weights of ERNIE 4.5. It remains to be seen whether they continue to release weights of newer releases as well.Honorable MentionsThe rest of the labs that we are watching.Multimodal Art Projectionm-a-p.ai | 🤗 m-a-pAn open research community, releasing all kinds of models (including a truly open 7B language model with data, etc.). Now, they’re mostly known for the music generation model YuE.Alibaba International Digital Commerce Groupaidc-ai.com | 🤗 AIDC-AIAnother R&D arm of Alibaba, mostly releasing niche models building upon Qwen.Beijing Academy of Artificial Intelligence (BAAI)baai.ac.cn | 🤗 BAAI | X @BAAIBeijingAs a university, the Beijing Academy of Artificial Intelligence has a high diversity of projects. They are mostly known for BGE, which are capable embedding models.inclusionAI🤗 inclusionAI | X @InclusionAI666The open weight arm from the Ant Group (an affiliate of Alibaba handling mobile payments and some financial industries), responsible for Ling Lite, a series of LLMs.Pangu (Huawei)huaweicloud.com | X @HuaweiCloud1Huawei is working on AI accelerators to threaten the market share of Nvidia GPUs, which are often targeted by regulations, both from the US and China. Their model releases are mostly to show what’s possible with their cards, but not without drama accusing them of upcycling Qwen models and not stating it. We would expect them to continue to release more models in the near future. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    12:41
  • Contra Dwarkesh on Continual Learning
    Dwarkesh Patel’s now well-read post on why he is extending his AI timelines focuses on the idea of continual learning. If you ask me, what we have already is AGI, so the core question is: Is continual learning a bottleneck on AI progress?In this post, I argue that continual learning as he describes it actually doesn’t matter for the trajectory of AI progress that we are on. Continual learning will eventually be solved, but in the sort of way that a new type of AI will emerge from it, rather than continuing to refine what it means to host ever more powerful LLM-based systems. Continual learning is the ultimate algorithmic nerd snipe for AI researchers, when in reality all we need to do is keep scaling systems and we’ll get something indistinguishable from how humans do it, for free.To start, here’s the core of the Dwarkesh piece as a refresher for what he means by continual learning.Sometimes people say that even if all AI progress totally stopped, the systems of today would still be far more economically transformative than the internet. I disagree. I think the LLMs of today are magical. But the reason that the Fortune 500 aren’t using them to transform their workflows isn’t because the management is too stodgy. Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.I like to think I’m “AI forward” here at the Dwarkesh Podcast. I’ve probably spent over a hundred hours trying to build little LLM tools for my post production setup. And the experience of trying to get them to be useful has extended my timelines. I’ll try to get the LLMs to rewrite autogenerated transcripts for readability the way a human would. Or I’ll try to get them to identify clips from the transcript to tweet out. Sometimes I’ll try to get them to co-write an essay with me, passage by passage. These are simple, self contained, short horizon, language in-language out tasks - the kinds of assignments that should be dead center in the LLMs’ repertoire. And they're 5/10 at them. Don’t get me wrong, that’s impressive.But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human's. But there’s no way to give a model high level feedback. You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.The core issue I have with this argument is the dream of making the LLMs we’re building today look more like humans. In many ways I’m surprised that Dwarkesh and other very AGI-focused AI researchers or commentators believe this — it’s the same root argument that AI critics use when they say AI models don’t reason. The goal to make AI more human is constraining the technological progress to a potentially impossible degree. Human intelligence has long been the inspiration for AI, but we have long surpassed it being the mirror we look to for inspiration. Now the industry is all in on the expensive path to make the best language models it possibly can. We’re no longer trying to build the bird, we’re trying to transition the Wright Brothers’ invention into the 737 in the shortest time frame possible.To put it succinctly. My argument very much rhymes with some of my past writing. Do language models reason like humans? No. Do language models reason? Yes. Will language model systems continually learn like humans? No.Will language model systems continually learn? Of course.Interconnects is a reader-supported publication. Consider becoming a subscriber.Dwarkesh writes “Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs.” This is because we’re still early on the buildout of the technology. Human labor takes an immense amount of context and quick thinking, both of which we’re starting to unlock with our language models. On top of this, human labor may not be what we want to create — we want to augment it. Using LLMs as drop in replacements for humans is not a requirement for AGI nor is what Dwarkesh describes a fundamental limitation on AI progress. Francois Chollet cleverly poked at this weakness in his recent conversation with Dwarkesh at an ARC-AGI event:Well, how do you define the difference between the ability to adapt to a new task and learning on the fly? It's, it sounds like the same thing to me.Language models can already pick up subtle context extremely fast. ChatGPT’s memory feature has gotten far better for me. When we’re using the far more powerful models we can expect in the next 18 months this’ll already start to appear magical. Language models are extremely apt at inferring context even without us giving it to them. Soon we’ll be unlocking that subtle connection engine by providing immense, explicit context. I don’t know of anyone who has actually thoroughly digitized all the relevant context of their job and formatted it in a way that is easily readable by an LLM. GPT-5 Pro estimates that all of the writing on Interconnects would be only 500K tokens. That would fit into an existing LLM with no extra system, but I’ve never tried it.The problem that Dwarkesh is facing is that we’re still using LLMs primarily in a single generation manner, which got far better with the introduction of reasoning models, but the economically useful way to use current tools in more complex intellectual domains will require a deep-research style approach over all of your recent work interactions. No one is giving language models that kind of context. None of the tools we use are set up properly to accumulate this type of context.I expect this to change rapidly. ChatGPT, Claude, and the likes are all adding memory features across chats and countless connectors to other pieces of information in your professional life. These memory features will be omnimodal and essential to extracting the type of value Dwarkesh wants. Without them, I agree language models in their current form are hopeless at solving continual learning.This is what I would expect the rumored $2000/month ChatGPT level subscriptions to work with. Each of these bespoke tasks needs to absorb a ton of context and reasoning tokens in order to make a directionally right output. If someone built the Claude Code equivalent for my Substack, with every post tagged by topic and performance metrics, I bet the AI could easily make useful suggestions on how to format my content.Continual learning in how Dwarkesh presents it is a systems problem rather than a learning problem. I expect better context management over my information ecosystem to exist in 2026, but more work to be needed for the AI companies to know how best to reference it and unlock in-context learning that feels like rapid adaptation. Call that 2027.The models that have been released in 2025 will make this far more tractable in the near future. Reasoning models have made in-context learning far more powerful, resulting in rapid progress on held-out and complex domains such as ARC-AGI. These models also have come with massive improvements in context length. Claude and Gemini have 1M+ token context lengths and GPT-5’s is at 400K — they’re all growing steadily. What is important with the context length numbers is that evaluations are showing that these are meaningful improvements that the models can leverage intelligently.With these reasoning models and smart retrieval of context, the systems we are building will look indistinguishable from continual learning. This will definitely be multiple LLMs working together and will operate very differently than the first versions of ChatGPT we were given (and often still use today).The path to continual learning is more context and more horsepower. This is directly in line with the direction AI investment is going. This doesn’t feel like a bottleneck, rather another product problem that we are going to solve. This sort of continual learning may not enable the type of raw intelligence and autonomy that many vocal leaders in AI describe as “superintelligence.” Training models to be smarter on even more complex tasks — e.g. novel biological research — requires mastering agentic behaviors that need to be learned from scratch, as discussed in my post on “What comes next with RL”. There’s no internet scale pretraining data for such agentic tasks. My point is that not all jobs that require continual learning will require the frontiers of intelligence. I’m excited to write blog posts with the bliss of my ChatGPT 6 co-editor.This technology coming soon will not be without its challenges. My first reaction to the continual learning post was more in line with “society isn’t ready for this” rather than commentary on its feasibility. I’ll repeat my warning:For a long time I’ve written that AI models have a higher risk potential in terms of social outcomes because the modalities they interact with us in are far more personal… As AI is going to be so powerful as a standalone entity, breaking some of the symbiotic links will be good for adding friction that makes the technology easier to steer towards good outcomes. In short, be wary of wishing for end-to-end (reinforcement) learning when you’re part of the environment.2 It’s a destiny to dystopia.What we have today is a form of AGI and it’ll soon get much better with better context and memory. The industrialization of language models is giving us incredible improvements across a wide swath of use-cases. These will blow past many basic primitives of intelligence in humans that have motivated AI for decades. First was models reasoning, then will come systems with continual learning. This is exactly what most AI companies are actually building — regardless of what their superintelligence messaging is.Comments are open on this post, please continue the debate! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:04
  • GPT-5 and the arc of progress
    If you want a video version of this, check out the last 20 minutes of the livestream reaction (edit, fixed link) I did with Will Brown of Prime Intellect and Swyx of Smol AI & Latent Space.GPT-5 was set up to fail on some of the narratives it was expected to satisfy. The two central themes it had to decide between were the AGI (or superintelligence) narrative that Sam Altman & co. have been using to fundraise and the fact that ChatGPT is one of the fastest-growing consumer technologies of all time. To fulfill both, GPT-5 needed to be AGI while also being cheap enough to serve as the most-used AI system in the world. Business and technological realities made it inevitable that GPT-5’s primary impact would be to solidify OpenAI’s market position, even if it raises a lot of eyebrows for the long-term trajectory of AI.The reactions online capture this as well. The OpenAI live streams have historically catered to AI insiders, but the product speaks entirely to a different audience. The people discussing this release on Twitter will be disappointed in a first reaction, but 99% of people using ChatGPT are going to be so happy about the upgrade. Confusingly enough, this includes many of the critics. GPT-5 is a good AI system. It’s right in line with best-in-class across pretty much every evaluation, while being cheap enough to serve the whole world. OpenAI is largely fixing its product offering with an announcement that was hyped to be one of the biggest AI news cycles of the year. AI news being loud is defined by narratives being different more-so than technology being better. OpenAI releasing an open model again will likely be pinpointed as just as important a day for the arc of AI as the GPT-5 release. In many ways GPT-5 was set up to fail and that is very off-putting for those expecting maximum AI progress in the near term.I’m not going to dwell on it, but oh boy, that was a messy release. GPT-5 being announced and rolled out like this is very odd. Countless plots were mislabeled, live demos had bugs, and the early rollout is doing some weird stuff. This reinforces how OpenAI was torn about the release and backed into a corner with their messaging. They knew they needed to improve the experience with strong competition in the industry, but releasing GPT-5 needed to make a splash after how long they’ve waited (and already parked the GPT 4.5 name).The core question we track in this post is: What does it mean for the next 6-18 months of AI progress if GPT-5 is just as good as all the best models out there, e.g., Claude Sonnet for coding or o3 for search, funneled into one, super cheap package? If AGI was a real goal, the main factor on progress would be raw performance. GPT-5 shows that AI is on a somewhat more traditional technological path, where there isn’t one key factor, it is a mix of performance, price, product, and everything in between. Interconnects is a reader-supported publication. Consider becoming a subscriber.GPT-5’s performanceThere are a few places that we can see that GPT-5 represents a solid step on the performance trend line, but nothing like a step change. First, on LMArena, GPT-5 is fantastic, sweeping the board to #1 on all categories. The last model to claim #1 in pretty much every category was Gemini 2.5 Pro — and that was the biggest step change in Elo since GPT-4 Turbo skyrocketed past the first Claude.Second, GPT-5 is the top model on the ArtificialAnalysis composite benchmark.These two, LMArena & ArtificialAnalysis, represent two coarse evaluations — community vibes and raw benchmarks. Both of these can be gamed, but are still correlated with real-world use. You can also see in OpenAI’s shared results how much the smaller versions improve on the likes of GPT-4.1 mini and o4-mini.In many ways, the march of progress on evals has felt slowed for a while because model releases are so frequent and each individual step is smaller. Lots of small steps make for big change. The overall trend line is still very positive, and multiple companies are filling in the shape of it. My post on “what comes next” from earlier this summer all but called this type of release, where the numbers aren’t shocking but the real world use cases are great, becoming more common.This is a different path for the industry and will take a different form of messaging than we’re used to. More releases are going to look like Anthropic’s Claude 4, where the benchmark gains are minor and the real world gains are a big step. There are plenty of more implications for policy, evaluation, and transparency that come with this. It is going to take much more nuance to understand if the pace of progress is continuing, especially as critics of AI are going to seize the opportunity of evaluations flatlining to say that AI is no longer working.To say it succinctly: Abilities will develop more slowly than products.The product overhang is being extended with each release. We’re still building untapped value with AI models and systems faster than we’re capturing it.Another way to see this incremental push out in models or systems is through OpenAI’s update to the famous METR plot of time to completion for humans of various tasks AI systems can solve 50% of the time. GPT-5 is leading, but also just in line with trends.All of this is to say comprehensively that AI progress is very alive and well, as long as you don’t subscribe to the exponential takeoff in ability. Those arguments are very strained by this GPT-5 release.Yes, AI progress on intelligence and “raw ability” is certainly going to continue at a solid pace for a long time, but how will this translate into recursive self-improvement?GPT-5’s detailsIf you’re reading closely, you may have noticed that this post uses the word system instead of model. All of the leading chat systems have been adding more components onto them like safety checkers and so on, but this is the first one to use different architectures and weights for the primary generation of content across similar queries. GPT-5 is the first in what is to come, mostly to better balance cost and give better user experiences. From the system card:GPT‑5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time.Along with this, they shipped many product improvements, such as how the model has a 400K context window in the API with great performance, reduced hallucinations, and new personalities. Primarily, I worry as a power user about the router. I sense that for now I’ll default to GPT-5 Thinking, and sometimes upgrade to Pro mode, while downgrading to standard GPT-5 only for benign queries (depending on its search behavior — if it is search-heavy like o3 without thinking, then it should still work well). Thankfully, the thinking mode has a “get an early answer” button, so I don’t see any reason to start elsewhere. If I need an answer fast, I’ll get one. If not, I want the best responses possible.As for prices, here’s a comparison. GPT-5’s top-level model is cheaper than Claude Sonnet and far better than any OpenAI model has been before at coding — one of the core details of this release. Matching Gemini Pro’s pricing when considering Google’s infrastructure advantage is a substantial accomplishment.* OpenAI — GPT-5 (API sizes)* GPT-5: input $1.25, output $10.00. (OpenAI)* GPT-5 mini: input $0.25, output $2.00. (OpenAI)* GPT-5 nano: input $0.05, output $0.40. (OpenAI)* OpenAI — o3 (reasoning)* o3: input $2.00, output $8.00. (OpenAI Platform)* o3-mini: input $1.10, output $4.40. (cached input $0.55) (OpenAI Platform)* Anthropic — Claude 4 family* Claude Sonnet 4: input $3.00, output $15.00. (Anthropic)* Claude Opus 4.1: input $15.00, output $75.00. (Anthropic)* Google — Gemini 2.5* Gemini 2.5 Pro: input $1.25 (≤200k prompt) / $2.50 (>200k); output $10.00 (≤200k) / $15.00 (>200k). (Google AI for Developers)* Gemini 2.5 Flash: input $0.30 (text/image/video) or $1.00 (audio); output $2.50 (includes thinking tokens). (Google AI for Developers)* Gemini 2.5 Flash-Lite: input $0.10 (text/image/video) or $0.30 (audio); output $0.40. (Google AI for Developers)Cheaper, thinking models that work well in applications are far more useful than scaling (as GPT-4.5 has shown us).GPT-5’s impactIt seems like most people in all walks of life are going to love this model — from AI researchers all the way to people who are learning of ChatGPT for the first time today. This is very in line with my expectations for how AI will proceed, as a long, steady march of progress. The fact that the models are getting way cheaper rather than way more expensive definitely signals that we cannot just brute-force scale our way to much stronger systems. Scaling helps, but it is now one of many considerations, and all the laboratories are showing us that much bigger models have diminishing returns in value to customers. At the same time, models being cheaper could be just what we need for Jevons paradox to kick in and provide another boost in AI adoption.Many people will claim that the GPT-5 release was a flop and the bubble will pop for AI. This is downstream of the industry generally making totally unrealistic promises. As someone whose core through-line when covering frontier models is tracking the pace of progress, I translate this as “AI capabilities on benchmarks will proceed a bit more slowly, but we aren’t reaching any clear walls in performance.” The AI performance hills we’re climbing up as an industry do put up some more resistance as the obvious low hanging fruit is gone, but we have the tools to overcome it consistently for the next 6 to 18 months. For companies that have been fundraising on promises of AGI, such as Anthropic and OpenAI, closing the next rounds could be harder. Of course, this depends on whether the messaging of the rounds was a key part of the fundraising. This fundraising inspires capital expenditures across the industry, e.g. TSMC developing the next node for NVIDIA to build new chips, and so on. The AGI narrative and the fundraising it has enabled have been good for the U.S. in terms of building out valuable, raw infrastructure. This could be the beginning of the money train slowing down, but that’s very different from a derailment and a stock market crash. As raw infrastructure spend slows, there will be even more pressure to deliver valuable products to users. A key trend for 2025 has been many of those appearing — Deep Research and Claude Code being the paradigms that everyone has copied. GPT-5 makes these applications better and makes it easier and cheaper for the next viral AI products to hit the market. I’m still excited for what is to come. But first, I’m going to sign off and go play with GPT-5. It’s a good day to build something for the fun of it. As I use it more, I’ll have more to say.Extra GPT-5 linksFor more specifics on the model from people who got early access, I recommend Tyler Cowen, Every.to, or Simon Willison (or Swyx soon, on Latent.Space).Livestream link: https://openai.com/gpt-5/ Research blog post: https://openai.com/index/introducing-gpt-5/ Developer blog post: https://openai.com/index/introducing-gpt-5-for-developers Enterprise blog post: https://openai.com/index/gpt-5-new-era-of-work GPT-5 landing page: https://openai.com/gpt-5/ System Card: https://openai.com/index/gpt-5-system-card/ Coding examples: https://openai.github.io/gpt-5-coding-examples/What would you say if you could talk to a future OpenAI model https://progress.openai.com/Finally, I’ll plug again the video I did with Will Brown and Swyx: Send me the most interesting things you find on GPT-5! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:41

More Science podcasts

About Interconnects

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai
Podcast website

Listen to Interconnects, Radiolab and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

Interconnects: Podcasts in Family

Social
v7.23.9 | © 2007-2025 radio.de GmbH
Generated: 9/19/2025 - 6:00:33 AM