Library / In focus

AXRPCivilisational risk and strategy

Joel Lehman on Positive Visions of AI

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines core safety through Joel Lehman on Positive Visions of AI, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedTechnicalMedium confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 14 full-transcript segments: median 0 · mean -1 · spread -10–0 (p10–p90 -7–0) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.

Slice bands

14 slices · p10–p90 -7–0

Mixed leaning, primarily in the Technical lens. Evidence mode: interview. Confidence: medium.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 14 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyaxrpcore-safetytechnical

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video m446AjcGVqs · stored Apr 2, 2026 · 414 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/joel-lehman-on-positive-visions-of-ai.json when you have a listen-based summary.

Show full transcript

hello everyone this is one of a series of short interviews that I've been conducting at the Bay Area alignment Workshop which is run by far AI uh links to what we're discussing as usual are in the description um a transcript is as usual available at a.net and as usual if you want to support the podcast you can do so at patreon.com axr podcast well let's continue to the interview all right today I'm going to be chatting with Joel Layman Hello nice to beet you here yeah so uh first of all for people who aren't familiar with you um who are you and what do you do yeah um so I'm a machine learning researcher I've been in the field for 10 12 years um Co co-author of a book called why greatness cannot be planned which is about open-endedness and um I've been a researcher at Uber Ai and most recently open Ai and currently an independent researcher gotcha um so before we talk about what you do uh uh or your current research so we're currently at this uh alignment workshop being run by fari how are you finding the workshop it's great yeah lots of nice people lots of interesting ideas and good to see old old friends so what are you currently thinking about working on yeah I'm really interested in ways like positive Visions for how AI could go well and set against kind of a default assumption that AI is currently deployed might kind of exacerbate existing societal problems so kind of the rough intuition is capitalism is great but also the kind of the downsides of capitalism is kind of um we get what we want but not what we need and maybe AI um will get really good at giving us what we want but at the you know the expense of kind of epistemic um sense making and meaningful interactions with people and political destabilization that kind of thing like it seems to me that um you know understanding the world having meaningful interactions with people those those are also things people want right so can you give me a feel for and political destabilization is I guess a little bit more tricky but yeah do you have a sense for why AI might erode those things by default yeah I think it's interesting and we do kind of have a sense of what we want kind of in a grander scale like you know like what a meaningful life might look to us at the same time it seems like if you look at the past maybe decade or two in general feels almost like a like death by a thousand paper cuts where we get convenience at the expense of something else like um and one way of thinking about it is kind of some part of society is kind of reward hacking the other half of society so like Facebook had a really beautiful motivation or social media in general is a very beautiful motivation yeah but in practice it seems like it maybe it's made us less happy and less connected and we find ourselves addicted to it and to our cell phone and kind of all the attention economy sorts of things where um again we might be in touch with what we you know like what our better Angel would do and yet kind of like it's really hard to have the discipline will power to kind of resist the optimization power against us okay and so so is the vision something like we get ai ai is just like a lot better it's really good at optimizing for stuff but it's optimizing but like we sort of have this like instant gratification thing that the a optimizes really hard for at the expense of like higher level things is that roughly right yeah I think you can look at kind of the um different kind of AI products are coming out and some of them might be kind of um beneficial to kind of our wellbeing into our um our greater aspirations but some non-trivial trunk will be kind of seemingly like it mean it's hard to know it'll play out but like um like take um like replica for example like a really interesting technology and but there's a lot of fears that if you optimize for market demand which might be like kind of like companions that um say all the things that make you feel good but don't actually help you kind of connect with the outside world or be social or something right right um so yeah up against that I guess yeah you said that you're thinking about positive visions of AI like what what kinds of positive Visions yeah so I I wrote an essay with Amanda no um kind of like pitching that part of what we need is just the sense of the world we' like to live in um and what the kind of alternatives to some of the technologies that are coming out are so you know if social media has had kind of you know some knock on effects that are a little bit um bad for our well-being and for our societal kind of infrastructure um yeah what would a positive version of that be and it's kind of putting aside the market dynamics which make it maybe difficult to to actually realize that in practice but at least having a sense of what you know something we might want would be one of the technical problems you might need to solve to try to even enable that and and before we go into that uh what's what's the name of that essay so people can Google it that is a great question I think it's called we need a we need positive visions of AI grounded in wellbeing yeah okay hopefully that's enough for people to find it and there'll be a link in the description oh cool yeah so how yeah so so can you can you give us a I don't know do do you have some of the positive Vision yet or is that is that say mostly like it would be nice if we had it part partly it's like it'd be nice if we had it and partly there's at least some chunks of it that we can be going to begin to map out what it might look like and what are some of the technical challenges to to realizing it um at a a research paper um out maybe a year ago called machine love and it was um about like trying to take principles from positive psychology and Psychotherapy and imagining what would be like to kind of bash those up against the formalisms of machine learning and is there a kind of a productive intersection um and it tries to get into the IDE the ideas of um going Beyond just kind of revealed preferences or kind of bolman rationality or whatever kind of you know um convenient measure of kind of human preferences we have to something that's more in touch with um what we kind of know from the humanity is about what human flourishing broadly looks like so we want to get like a a better read on I know human flourishing on yeah things other than like very simple baltim rationality um yeah I'm wondering if you have takes about how we or if someone's just a machine learning person like how do how do they start going about that do you think yeah I think it's a it's a great question I think that there are you know um ways that we could just stretch the current formalisms that would kind of be useful um there's really interesting work by um Micah I'm forgetting his last name Carol Carol yeah at um at UC Berkeley y yeah on um just the the difficulties of of preferences that change and so that's kind of going a little bit beyond kind of the idea of you know you you your preferences are fixed and immutable um so like keeping stretching things outwards from there and dealing more with the messiness of human psychology that you know we not simple utility maximizers and that um you know decision we can actually make decisions against our best best interests and how do you start to Grapple with that and another thing would be just to be in touch with um yeah the broader literature from the humanities and about that and trying to find if there are like interesting inroads from um the philosophy of flourishing to the kinds of formalisms we use I think like work um like Cooperative infor reinforcement learning feels like kind of like you know it's again stretching the formalism to kind of Encompass more of the messiness in there and I feel like things in that direction I think I'd be really excited about okay so so just yeah kind of engaging more with like yeah difficulties of like what's going on with humans like funky stuff just like trying to Grapple with it and just move a little bit less from this sort of very theoretical Place yeah yeah like like um like re reinforcement learning for human feedback is really cool and it's you know really based on kind of um pair wise comparisons and right so how how could we go beyond that um I don't have like great ideas there it seems like really difficult but I think it's exciting to think about like what other things could exist there yeah so you mentioned M Carol's work um he's done some work on like recommendation systems and my understanding is that you've thought a little bit about that um yeah can can you tell us a little bit about that yeah I I'm really interested in the machine learning systems that underly kind of like the big levers of society things that drive um social media or um platforms like YouTube or um you know Tik Tok things systems that really um impact as at scale in kind of interesting and kind of hard toy ways M and it's easy to kind of quantify engagement or you know how you might rate something and more difficult to get to the question of like um you know what's an experience that you know I years um from now I'll look back and really be grateful for so kind of a project in that direction is um working with Goodreads I just really like books so good reads has kind of a data set of like books and like text reviews and um ratings and what's really cool about it I think just as a data set is that encompasses years of sometimes you know books uh books that a person has read and that even though a book you know a lot of books you read like don't really impact you that much there are the the potential for kind of transformative experiences in there where you know the reading a book um could actually change the course of your life in some way and that there are these text reviews that actually could detail you know how a book has affected you so you could look for phrases like has this book changed my life and the hope is that building that data set you come up with a recommendation system that um could be tailored to the history of what you've read and maybe could contribute to your development so to the change of your preferences the change of your worldview and I've done some initial work in that direction it's like uh not easy to create such a system but I think like for those kinds of data sets it be really exciting to see people try to um grapple with the challenges of yeah changing preferences and deeper things that a person might want yeah I guess it's an interesting it's an interesting problem because like at least in the abstract not all not all changes are good right like I can imagine like somebody saying like oh yeah this book changed me you know for good and I read that I'm like no thanks um yeah I'm wondering if you have thoughts about how to how to kind of disentangle the good types of changes from the bad types yeah uh that's like a huge Challenge and in some initial work if you just kind of Stack rank books by the percentage of reviews that people say they they changed their lives you get things like Tony Robbins at the top and kind of self-help books which maybe really do change a person's life but also could be just like hacking your sense of um uh this kind of excitement about a new idea that actually doesn't really change your life in a good way so I think tricky or or one one thing that I think goes on there is like if a book has changed your life there's like a pretty good chance that it's like a self- health book but if you read a self-help book there's a good chance that it will not in fact change your life so it's like like both are going on I guess yeah I think they're really deep philosophical questions about um yeah how to know if it changes good or or bad for you and you know questions of paternalism that step in if the machine Learning System is you know guiding you in a way that you wouldn't necessarily want but is changing you and one way to try to at least one first cut approximation of something that might be okay is if it could project out possible possible futures for you um um like if you read this book it might impact you in this way right or or giving you the afford and say like um you know I really would love to be able to someday appreciate um ulyses that you know it's a famously challenging book right and what are the you know the things I would need to do to be able to appreciate that so trying to rely a little bit on human autonomy um but yeah there's messy societal questions there because you could imagine also I don't think this is really going to happen in book recommendation but um as a microcosm if um if everyone got kind of like hooked on to like an ranville like that's maybe the easiest path to have like a life-changing experience or something like I you read an and Ryan it's like you know really changed my worldview or something but the kind of you know second order consequence is something that is a lot of people are getting stuck in a philosophical kind of uh dead end or local Optima or something so I don't know I think there's like yeah definitely challenging questions but I think there's also probably principled ways to kind of navigate that space yeah I guess it's like I feel like the thing you ideally would want to do is like have some sort of conversation with your future self right where you could be like oh yeah this like seems like it is bad but like like I don't know maybe you've got like some really good reason or like ah this seems good but like let me let me just check you know um it's I don't know it seems pretty hard but yeah no I think it's great yeah like you know there I think there's also independent reasons it's nice to talk to your F future self and that um I think there's some research it shows that um actually can motivate you to kind of to change to be the kind of person you want to be um there also the weird kind of thing is um you might it's the stepwise changes that you go through in your like worldview um you might when you get to the end of it approve of those changes but at the beginning you look at those you be like you know I didn't want to become that kind of person or something like as a kid I wanted to be an astronaut right um and maybe you know kid me talking to pres me like wow your life is kind of boring you could have been on the moon or something um so I yeah I don't mean to like always bring up the philosophical annoyances but it's like interesting that like it's hard to I think in Micah's paper kind of um talks a bit about this kind of difficulty with by whose preferences do you judge whether your future preferences are good or something right right yeah um yeah I think it's challenging and I think that's uh Food For Thought for our listeners so um thanks very much for chatting with me uh thanks for having me it's been great this episode was edited by Kate brunot amberon helped with transcription the opening and closing themes are by Jack Garrett financial support for this episode was provided by the long-term future fund along with patrons such as Alexi maaf to read a transcript of the episode or to learn how to support the podcast yourself you can visit hrp.net finally if you have any feedback about this podcast you can email me at feedback axr p.net [Music] oh [Music] [Music]

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

Mirror pick 2

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

Mirror pick 3

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

Spectrum vs this page

This page -10.64This pick -10.64Δ 0

This pageThis pick

Near you on the spectrum — often same shelf or editorial thread, different conversation. Mixed · Technical lens.

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs