Powered by RND
PodcastsScienceDeep Papers

Deep Papers

Arize AI
Deep Papers
Latest episode

Available Episodes

5 of 46
  • AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam
    This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably  Humanity's Last Exam (HLE). In the session we covered Gemini 2.5's architecture, its advancements in reasoning and multimodality, and its impressive context window. We also talked about how benchmarks like HLE and ARC AGI 2 help us understand the current state and future direction of AI.Read it on the blog: https://arize.com/blog/ai-benchmark-deep-dive-gemini-humanitys-last-exam/Sign up to watch the next live recording: https://arize.com/resource/community-papers-reading/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    26:11
  • Model Context Protocol (MCP)
    We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging into. Learn how this open standard is revolutionizing AI by enabling seamless integration between LLMs and external data sources, fundamentally transforming them into capable, context-aware agents. We explore the key benefits of MCP, including enhanced context retention across interactions, improved interoperability for agentic workflows, and the development of more capable AI agents that can execute complex tasks in real-world environments.Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    15:03
  • AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
    This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks.We break down key announcements, including:DeepSeek’s Big Launch Week: A look at FlashMLA (DeepSeek’s new approach to efficient inference) and DeepEP (their enhanced pretraining method).Claude 3.7 & Claude Code: What’s new with Anthropic’s latest model, and what Claude Code brings to the AI coding assistant space.Stay ahead of the curve with this fast-paced recap of the most important AI updates. We'll be back next time with our regularly scheduled programming. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    30:23
  • How DeepSeek is Pushing the Boundaries of AI Development
    This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s what you need to know.Read a summary: https://arize.com/blog/how-deepseek-is-pushing-the-boundaries-of-ai-development/Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    29:54
  • Multiagent Finetuning: A Conversation with Researcher Yilun Du
    We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains." This paper introduces a multiagent finetuning framework that enhances the performance and diversity of language models by employing a society of agents with distinct roles, improving feedback mechanisms and overall output quality.The method enables autonomous self-improvement through iterative finetuning, achieving significant performance gains across various reasoning tasks. It's versatile, applicable to both open-source and proprietary LLMs, and can integrate with human-feedback-based methods like RLHF or DPO, paving the way for future advancements in language model development.Read an overview on the blogWatch the full discussionLearn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.
    --------  
    30:03

More Science podcasts

About Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 
Podcast website

Listen to Deep Papers, Ologies with Alie Ward and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v7.15.0 | © 2007-2025 radio.de GmbH
Generated: 4/17/2025 - 6:14:12 AM