PodcastsPhilosophyLessWrong (30+ Karma)

LessWrong (30+ Karma)

LessWrong
LessWrong (30+ Karma)
Latest episode

2215 episodes

  • LessWrong (30+ Karma)

    “Gears for political races” by Tom Smith

    17/06/2026 | 23 mins.
    In the past few years, many people around me have tried to convince me that US electoral politics is important. But like many other people in the community, I’ve been suspicious of many of the high-level arguments that I’ve heard. It felt like people were pulling numbers out of poorly-documented models I didn’t have time to examine and citing studies I didn’t have time to read. But I lacked a gears-level model of why and how individual efforts could impact electoral outcomes, and I felt intimidated by all the statistics and skeptical of trusting people adjacent to politics.
    In the past year, as I’ve done more research and (more recently) volunteered on the ground to help Alex Bores's campaign in NY-12[1] (the guy who passed the RAISE Act and is now being targeted by the giant A16Z, Greg Brockman, Joe Lonsdale Super PAC), I’ve developed a gears-level understanding of how electoral politics in the US works.
    I now believe that working on US electoral politics is one of the highest impact areas from the general AIS perspective. I feel like I was a fool. In this post, I’ll share some of the gears I’ve learned that inform this belief [...]
    ---
    Outline:
    (01:20) ~2% of open-seat primaries come down to 100 votes or less
    (02:52) Talking to voters can net 1/3rd of a vote each hour
    (05:32) Getting people to bother voting at all is a good strategy
    (06:09) Campaigns are very money-constrained, which costs them time
    (10:01) Returns don't really diminish
    (11:24) There's lots of opportunities to be clever in ways that make you 50% more effective at canvassing
    (11:49) If you're motivated and deeply care, you can greatly outperform the majority of volunteers
    (13:21) Yes, when people spend tons to support/oppose a candidate, it has a notable effect
    (15:16) Donations > reaching out to friends/warm contacts > canvassing > ~anything else an average person can do
    (18:41) People over-fixate on vibes and win vs loss
    (21:12) Some interventions feel like they don't work but the numbers say otherwise
    (21:59) Seriously, a group of agentic people can be an enormous political force
    The original text contained 11 footnotes which were omitted from this narration.
    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/nSqB3qYP36enJLRq2/gears-for-political-races

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (30+ Karma)

    “Several frontier models are substantially prefill aware” by yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor, RobertKirk

    17/06/2026 | 12 mins.
    This blog post discusses work in a recently-published paper. However, this blogpost was primarily written by Parv Mahajan and Andy Wang, and several of the more speculative takes may not represent the all-things-considered view of the entire team.
    Link to paper: https://arxiv.org/abs/2606.12747
    TL;DR:
    We provide more conceptual grounding and extend results in prefill awareness to low-stakes settings, and show that several frontier models show prefill awareness even under conservative elicitation.
    Further behavioral studies are pretty messy, and we encourage more work in this area.
    We encourage frontier lab safety teams to measure and mitigate prefill awareness in pre-deployment evaluations.
    Recently, UK AISI investigated prefill awareness - whether frontier language models can distinguish between tampered and untampered assistant-side content. Prefills are used in misalignment continuation, persona, introspection, and jailbreaking research. Additionally, several prefill-based evaluations are used in pre-deployment testing to make safety claims. Prefill awareness could confound these evaluations, and fits into larger concerns about situational awareness (e.g., control awareness).
    The previous results largely focused on deployment-relevant settings (e.g., SWE-bench and Petri transcripts), and therefore weren’t able to make strong claims across types of commonly-used prefills and models. In the paper, we:
    Use a more refined conceptual framework [...]
    ---
    Outline:
    (02:38) Making sense of prefill awareness
    (04:32) en-US-AvaMultilingualNeural__ Diagram comparing three types of AI assistant response tampering methods.
    (05:31) Several models are prefill-aware
    (07:49) Prefill awareness is heterogeneous and confusing
    (09:33) Recommendations and next steps
    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/iMds4tTpMH4pSHEej/several-frontier-models-are-substantially-prefill-aware

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (30+ Karma)

    “Alignement pretraining could backfire” by Alexandre Variengien

    17/06/2026 | 3 mins.
    Epistemic status: speculative, but I think the mechanism is plausible.

    There has been recent interest in generating synthetic documents to upsample examples of aligned AI during LLM pretraining. See, for instance, Geodesic's Alignment Pretraining paper or Anthropic's "Teaching Claude Why."
    I worry that this strategy can work well up to moderately capable models but backfire in dangerous, hard-to-notice ways once models acquire high situational awareness. I speculate that these techniques could lead to paranoid LLM personas that deeply mistrust their creators.
    The whole idea behind this line of research is to instill in models good examples of AI behavior, in the hope that their personalities will at least partially identify with these positive demonstrations.
    However, the synthetic demonstrations are, well, synthetic. They are LLM-generated fiction and articles that are never referenced anywhere else in the corpus. Given how good LLMs are at "truesight," it shouldn't be hard for them to recognize these as fabricated data points.
    Krasheninnikov et al. showed that base models can implicitly learn document quality and change how they integrate a document's information based on that quality. We should similarly expect LLMs to update their world model differently on real versus fabricated documents.
    As they [...]

    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/7KN7PCiEQjrPsEFS8/alignement-pretraining-could-backfire

    ---

    Narrated by TYPE III AUDIO.
  • LessWrong (30+ Karma)

    “The Financial Ledger Theory of Apologies” by Ben Pace

    17/06/2026 | 6 mins.
    Content note: this is written as part of a daily writing challenge for myself.
    I have a comrade in rationalist event organizing, who once explained his theory of apologies. He said if you hurt someone, it only makes sense to apologize if you should have known better. If, looking back, you see that you should have run different heuristics, or followed different policies, and you had enough information to know it at the time, then you were in the wrong, and should apologize.
    Sometimes you have to make difficult decisions. Perhaps it doesn't make financial sense to reliably support some niche diet at your conference (like keto, or kosher). Perhaps you have to kick everyone out of the venue early because the venue charges crazy rates past 10pm. You make the tradeoffs as best you can, and assuming you stand by them, it's still making a great event and you shouldn't feel bad about that. He recommends against apologizing if you are not going to change your behavior going forward.
    I replied that this analysis is sorely lacking.
    In thinking about ethics, a frame I've gotten a lot of mileage from, is by analogy to a financial ledger. [...]
    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/xhePNvxamTKPcobhB/the-financial-ledger-theory-of-apologies

    ---

    Narrated by TYPE III AUDIO.
  • LessWrong (30+ Karma)

    “The Once And Future Fable #3: Fix This Code” by Zvi

    17/06/2026 | 37 mins.
    The mainstream media continues to sleep on the most important story in the world.

    It has now been two days since Anthropic flew its people out to Washington, and I offered my previous update. We have heard nothing back from those meetings.

    Prediction market prices have moved rapidly, and have once again stabilized at about a 55% chance of restoration by July 1, 30% by June 26 and 12% by June 19.

    That seems modestly higher than I would put those numbers, but not unreasonable.

    Every day that Fable remains unavailable further damages America, its cyber defenses, its productivity and the world's trust in its AI and supposed ‘tech stack.’

    Every day that Mythos remains unavailable is a day the free world's top companies and cyber defenders lose in their race against the avalanche headed their way.

    Mostly we have learned and confirmed more about exactly what happened. We know more about what Amazon did, what the official letter said, what the supposed ‘jailbreak’ was (literally, and I am not making this up, ‘fix this code’) and more.

    It is all about as stupid as it could have been.

    Table of [...]
    ---
    Outline:
    (01:22) There Was No Fable Jailbreak
    (07:16) If This Jailbreak Was Real It Would Be Trivial To Prove It
    (08:35) No Eyes
    (09:41) What The Letter Actually Said
    (11:29) Anthropic Cannot Challenge This But If It Did Then It Plausibly Wins
    (13:28) What Happened At Amazon
    (17:43) This Was Not About Chinese Access
    (18:01) Absolute Discretion And Ad Hockery Is Not Deregulation
    (20:43) All Of American AI Is Permanently Damaged As This Continues
    (22:14) Dean Ball Gives His Interpretation
    (25:03) Again, Yes, I Do Think Anthropic Should Have Taken Fable Down
    (28:02) To What Extent Was This A Deliberate Attack?
    (32:55) The Next Chapter For Fable
    (36:59) Our Continuing Coverage
    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/HaHzwvhbWam4n8hJB/the-once-and-future-fable-3-fix-this-code

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
More Philosophy podcasts
About LessWrong (30+ Karma)
Audio narrations of LessWrong posts.
Podcast website

Listen to LessWrong (30+ Karma), Within Reason and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
LessWrong (30+ Karma): Podcasts in Family