Library / In focus

Future of Life Institute PodcastCivilisational risk and strategy

Will Future AIs Be Conscious? (with Jeff Sebo)

Why this matters

This episode strengthens first-principles understanding of alignment risk and the strategic conditions that shape safe outcomes.

Summary

This conversation examines philosophy through Will Future AIs Be Conscious? (with Jeff Sebo), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Perspective map

MixedSocietyHigh confidenceTranscript-informed

The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.

An explanation of the Perspective Map framework can be found here.

Episode arc by segment

Early → late · height = spectrum position · colour = band

Risk-forwardMixedOpportunity-forward

Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).

StartEnd

Across 87 full-transcript segments: median 0 · mean -3 · spread -30–0 (p10–p90 -10–0) · 6% risk-forward, 94% mixed, 0% opportunity-forward slices.

Slice bands

87 slices · p10–p90 -10–0

Mixed leaning, primarily in the Society lens. Evidence mode: interview. Confidence: high.

- Emphasizes alignment
- Emphasizes safety
- Full transcript scored in 87 sequential slices (median slice 0).

Editor note

A high-leverage addition to the AI Safety Map that clarifies one important safety bottleneck.

ai-safetyconsciousnessfliphilosophysocietyintro

Play on sAIfe Hands

Episode transcript

YouTube captions (auto or uploaded) · video dWBV1rlZxIw · stored Apr 2, 2026 · 2,335 caption segments

Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.

No editorial assessment file yet. Add content/resources/transcript-assessments/will-future-ais-be-conscious-with-jeff-sebo.json when you have a listen-based summary.

Show full transcript

AI welfare is a credible, legitimate, serious issue. We can look for underlying architectural and computational features associated with consciousness. And we know that companies and governments are working to develop exactly those kinds of systems. We now have a global system of factory farming that is going to take decades to dismantle. At best, it might have been much easier to guide the development of our food systems in in a more humane and healthful and sustainable direction, but instead we put off the question and we waited until it was globally entrenched. I think that there could be similar risks with AI development that if we wait until later to have these conversations, then we default to a path of objectifying and instrumentalizing AI systems. Even when someone is made out of different materials than you and even when someone is vulnerable and dependent on you, you have power over them, you still have a responsibility to treat them with respect and compassion and consideration. And if we want AI systems to absorb those values as they become more powerful, then it would help for us to absorb those values ourselves. Welcome to the Future of Life Institute podcast. My name is Gus Ducker and I'm here with Jeff Sabbo. Jeff, welcome to the podcast. Yeah, thanks for having me, Gus. Fantastic. Uh, could you tell us a little about yourself to begin with? Sure. I am a philosopher by training. I have a PhD in philosophy from New York University and I now do philosophical and interdisciplinary work about a range of issues concerning how we interact with nonhumans and how we can improve our interactions with non-humans. So I work at the department of environmental studies at New York University and my own research is about the sentience and agency and moral status and legal status of animals and AI systems and other nonhumans. And then I also work on teams. So I direct our center for environmental and animal protection. And this is a research center that examines important issues at the intersection of environmental and animal protection like agriculture as it relates to farmed animal welfare and climate change and conservation as it relates to wild animal welfare and biodiversity loss. And I direct the center for mind ethics and policy which may be a little bit more in scope for our discussion today which is a a research center that examines the nature and intrinsic significance of non-human minds. So examining whether for example invertebrates or AI systems have consciousness, sentience, agency, what kinds of moral status, legal status, political status they deserve along with some other work like our wild animal welfare program and some related projects. And so I spent a lot of time doing research and teaching and service and engagement with companies and governments and NOS's and other actors in these spaces. That's great. And you're exactly right. The scope of this conversation as I see it is to talk about AI consciousness and the implications of artificial sensience. Great. And so, so if we start there, it's difficult for me to imagine and I imagine that it's difficult for others to imagine what artificial consciousness even looks like. You know, we can we can imagine a bi we can see a biological system. We can see that it's conscious, that it's sentient. When we think of artificial sensience, should we imagine a computer? Should we imagine a phone? What what should we see in our heads as we talk about this topic? Right. One question is how to go about assessing for consciousness in such a different kind of cognitive system. But another question exactly as you say is what is the cognitive system in the first place? Right? With humans and other animals, we can focus on the individual organism at the first approximation. But with AI systems, what are we focusing on in particular? Are we focusing on a particular set of hard drives located somewhere? Are we focusing on a particular set of instances of a software program? and and that could make a really big difference for determining the scope and size of the population and the nature of individuals and their interests and needs. And without directly answering the question right now, I can say that that is confusing and contested issue. There are ways of understanding who the subjects might be that would make them a small number of large subjects. And then there are ways of understanding who they might be that would make them a large number of small subjects. And they would have very different interests and needs and vulnerabilities and relationships with each other, relationships with us, conditions for survival, conditions for reproduction depending on how we answer those questions. But as a starting point, I can note that the difference between animals and AI systems in this regard is not all or nothing, is not totally binary. We do to some extent already face this kind of question with humans and other animals. With octopuses, for example, they can be usefully described as having nine interconnected brains, a kind of central command and control center, and then smaller clusters of neurons in each individual arm. And they exhibit some unified and some fragmented behavior. So even with non-human animals like octopuses, we need to ask are we asking about consciousness at the organism level, at the brain level or some combination of those? And and then those questions will only become more salient with AI systems. Should we imagine AI systems that experience emotions like we do like boredom and pain and curiosity or is that is that anthropomorphizing? Well, yes to both. It is it is anthropomorphizing and we should ask the question. So, anthropomorphism generally is the attribution of human traits to nonhumans and that can be paired with anthropo denial which is the denial of human traits in non-humans. And of course, some nonhumans have humanlike traits or at least broadly analogous traits. And then they also lack human traits and broadly analogous traits. And so the question is never really should we attribute or is that anthropomorphic. The question is instead in what cases should we attribute and embrace anthropomorphism versus in what cases should we not attribute and embrace anthropo denial. And probably it will be a little bit of a balance between the two. That certainly is the case for animals. Many non-human animals have some measure of humanlike traits including pleasure, pain, happiness, suffering, satisfaction, frustration, hope, fear, curiosity, boredom. It might be very different, but but they can be analogous enough for a general application of that term to make sense as a starting point. And then for AI systems, we have to go into that question with an open mind because they might be more like humans in some cases. For example, they might be better able to approximate humanlike language and reason, and that might give them interests that are more humanlike than those of non-human animals in some respects. But then in other cases and in other respects, they might have much less humanlike capacities due to their very different material substrates and their very different material origins. And and so they might have much less of the same kinds of vulnerabilities that we do. and and so we will just need to conduct a lot of research with an open mind about where the similarities and differences are going to be in this case. If we think of consciousness as some kind of information processing that's going on in the human brain and and in some animal brains and perhaps in some AI systems then we can say okay consciousness is in some sense substrate independent. It does it can exist on biological hardware or on current computer hardware. Could it be dependent on its substrate though such that there are certain experiences that are only available to biological systems or only available to systems that are that are running on computer hardware? Well, definitely that is a possibility given the current state of knowledge about the philosophy and science of consciousness and and we can distinguish two types of questions. One is whether a particular material substrate is required for consciousness at all. And then another is the question you asked whether a particular material substrate is required for a particular kind of conscious experience. And with respect to both questions, I think we need to have a state of uncertainty right now because we are not yet at a place of having consensus or certainty about the nature of consciousness. We still face the hard problem of consciousness. The problem of explaining why any physical system, including our own brains, can have subjective experiences. The problem of other minds. The problem that the only mind any of us can directly access is our own. And that significantly limits our ability to know what if anything it feels like to be anyone else. And we do have a lot of leading scientific theories of consciousness. And some of them are a little bit more in the biological naturalism zone and some of them are a little bit more in the computational functionalism or other kinds of functionalism zone and we can unpack those possibilities. But for me, my view is that it would be premature. it would even be arrogant for us to bet everything on our own personal favorite theory of consciousness right now. And so we should be at least somewhat open. We can lean one way or the other, but we should be at least somewhat open to other possibilities too. So even if I feel quite confident that biological naturalism is correct and that consciousness itself requires a carbon-based biological material substrate and associated chemical and electrical signals and oscillations, I should allow for at least a realistic chance, a non-negligible chance that I am wrong and that a sufficiently sophisticated set of computational functions or other kinds of functions realized in other sorts of hardware and other kinds of AR architectures would would suffice. And then to to answer your your question about particular kinds of conscious experiences, I think this is all the more true for particular kinds of conscious experiences. If AI systems can have conscious experiences, I think we need to presume that they would differ in various ways from those of humans and other animals. That might be partly due to the underlying material substrate. It might be partly due to different types of structural organizations or different types of functional capacities, but it would be a mistake to assume that you have the same experiences that I do. As Tom Nagel noted, it would be a mistake to assume that bats have the same experiences that humans do. And as we can now note, it would definitely be a mistake to assume that bots and especially like bat bots have the same experiences that we do. So yeah, lots of uncertainties at at different levels in this conversation. How hopeful are you that we will ever resolve these uncertainties to any level of satisfaction? I mean, there's a there's a for many people I talked to about the topic of artificial consciousness. There's this deep skepticism that we can ever make progress because we don't have access to the ground truth. We don't have a consciousness meter. And so we are perhaps the people working on the science and philosophy of consciousness are perhaps a bit like theoretical physicists without experimental physicists where we don't ever get to we don't ever get to make the experiment and and find out who's actually right. So I assume you you uh believe in in progress here but how hopeful are you? I do believe in progress and I am really unsure whether we will make such transformative progress that we have what amounts to a secure theory of consciousness about which we can have a high degree of confidence. that would require game-changing paradigm shifting progress of a sort that in some way or another pushes us beyond the the hard problem of consciousness beyond the problem of other minds. Now that may well be in the offing at some point. We have had paradigm shifts before in in science and philosophy and forms of progress that were previously thought to be impossible then started to seem inevitable. and and so I want to allow for the possibility that that kind of paradigm shift can happen and will happen. What I will say though is there is no guarantee that that kind of paradigm shift will happen. And if it does happen, there is no guarantee that it will happen in the next 2 4 6 8 10 years. And so for now, I want to have a research community that is working on dual tracks. On the one hand, continuing to make progress on foundational questions about the metaphysics of consciousness, the epistemology of consciousness, underlying conceptual issues concerning consciousness. And and then on the second track, working together as a community to figure out how we can at least reduce uncertainty or make more responsible estimates about the probability of consciousness in different systems given our current state of disagreement and uncertainty about the nature of consciousness and and the unresolved nature of these fundamental problems about consciousness. And so a lot of our own recent and current work is divided across these tracks, but placing a lot of emphasis on that second track because we want to be able to provide practical guidance for companies and governments about how to responsibly build and scale AI systems without necessarily solving the hard problem of consciousness because that might be too ambitious a task for say Anthropic or the US government. Yeah, this is a big question of course, but how do we navigate given that uncertainty? There are both false positives, there is attributing consciousness to systems that are not conscious, there are false negatives that are not attributing consciousness to systems that are in fact conscious. Um if we are if we're simply maximally risk averse and and and say okay we don't want to hurt anything if we can avoid it then we're ascribing consciousness to everything and that that that makes it difficult for us to act I think how do we how do we in a pragmatic way navigate this uncertainty yes that is a a great question and a great way of setting up the question one point to note is that how we think about this in science might be a little bit different from how we think about it in ethics. So in science the question is what is a reasonable null hypothesis or default assumption as we seek further evidence and then you might resolve that question by asking what is best supported by existing evidence and what is most conducive to further scientific progress. And then on the ethics side of course as you say you want to think about the probability and the magnitude of harm in both directions as a result of of both kinds of of mistake. And it is difficult to imagine a simple straightforward application of a precautionary principle working on the ethics side in this context. Partly because of of the massive stakes and and then partly because it can be easy and harmful to make mistakes in either direction. So obviously as you suggest it can be easy and harmful to make mistakes associated with false negatives. We do this all the time with non-human animals. This can be especially easy when they look really different than us, when they act really different than us, when we have incentive to use them as commodities. This is why we tend to underattribute sentience and moral significance to farmed animals, especially farmed aquatic animals or invertebrates. And that may be a risk with certain types of AI systems like Alpha Fold, you know, not the social systems, the chat bots, but but other types of systems. And that can be harmful because it can lead to exploitation, extermination, suffering, death against vulnerable populations, against their will, often for trivial purposes. But but it can also be easy to make the other mistake and it can also be costly or harmful to make the other mistakes. So, so it can be easy to to make the mistake of overattributing sentience and moral significance when non-humans look like us and when they act like us and when we have incentive to use them as companions instead of commodities. And so this is especially a risk even with present- day chat bots, large language models who are generating language outputs based on pattern recognition and text prediction and not because they have stable thoughts and feelings and and so on and so forth. And this can be costly too. It can lead us to form one-sided social and emotional bonds with these nonhumans to make inappropriate sacrifices for them allocating resources to them that would be better allocated towards humans and mammals and birds and so on. So this is a situation where there might not be any straightforward way to air on the side of caution and where we might have no choice but to weigh the risks and that means doing our best even though we will be bad at it and make mistakes. doing our best to make at least rough estimates of the probability of error in both directions and the magnitude of harm that would result from error in both directions and then try to calibrate and take a reasonable proportionate approach to balancing our risk mitigation in both directions. That requires a lot more work but I think is the only responsible way forward. You mentioned that when we see one of these chatbot systems being nice to us and interacting with us in ways we like we can kind of we can we can feel that okay maybe the system is conscious but then if we understand now I'm talking for myself if we understand how it works on the technical side we understand that it's trained on a big corpus of internet text it it's gone through reinforcement learning from human feedback to become nicer to us then the illusion or the intuition of of of consciousness in that system can fall apart. But I think can can't we can't we say something similar with humans where you know I I'm talking to you now I I can say I'm conscious but you if you were a kind of a perfect neuroscientist could give me some explanation. Okay, you're only saying you're conscious because of this activity in your brain and and and because you you you're evolved in in in this way and so on. So do you think we should respect the intuition that when we can explain a system, we shouldn't that system no longer seems conscious to us? Yes, I do think we have that intuition with other minds with with non-human minds. This has been true with animal minds and is starting to be true with digital minds as well. In the human case, we naturally understand that there can be different capacities coexisting and there can be different levels of explanation that make sense for our behaviors. Why do I do what I do? Well, because I had a conscious feeling or thought. Why did I have that feeling or thought? There might be underlying structural and functional explanations. Why do those structures and functions exist? There might be underlying material explanations. We understand that all of that can be true at the same time. But with other animals and now with AI systems, we tend to view those explanations as in competition with each other. And so if we do have some ability to provide a material, structural, functional explanation of what an animal does or what an AI system does, then we tend to dismiss the possibility that there might also be a conscious thought or feeling contributing to that causal process. Or if we can explain a behavior in terms of some complex application of simple capacities like perception and learning, then we feel no need to also explore explanations involving simple applications of more complex capacities like conscious thoughts and feelings. But again, that might not be right because it might be that animals do what they do because they perceive and learn and because they have conscious thoughts and feelings and these capacities work together and the best and simplest explanation might sometimes invoke one or the other or both. And that could be true with AI systems too. So we should resist the idea that if we can at least partly explain behaviors at one level, then that negates the need to explore possible explanations at other levels or or the possible emergence of capacities at other levels. And we should resist this temptation to become more confident that we know what is going on just because we understand some small piece of the picture because because that is all example of what we have done in the past. I expect many listeners have tried asking a chatbot whether it's conscious and for some chat bots uh it will say yeah I'm conscious and I'm feeling this or that way. You can ask it to to to create a picture of itself where it looks like okay maybe there's some person that I'm talking to. But on on many systems, there's now a an additional layer that monitors the output before it gets to you and then corrects it such that it says something like, I am simply a a large language model trained on this data. I'm not conscious. What do you think of this move from the AI companies? Well, I am not a fan of that move. I do appreciate that AI companies take themselves to have a responsibility to mitigate the risk that users will have the wrong impression form the wrong beliefs based on the language outputs from the models. So, so I think that is the right impulse. However, I think that there was an overcorrection in favor of forcing the models to straightforwardly deny even the possibility of AI consciousness. So for a while when users asked questions about consciousness, sentience, agency, moral status, legal status, political status, personhood, all of these associated concepts, when users asked about them, the response would be something of the form, as an AI assistant, I could never possibly have any such features. And that of course is way too simplistic, way too reductive, and not at all a reflection of of the current state of disagreement, uncertainty, confusion among experts in in science and philosophy. And so when in fall 2024 I worked with Robert Long and a team of authors including David Chomers and Jonathan Burch on a report called taking AI welfare seriously which was pitched in part as an argument for taking AI welfare seriously and and in part as a set of first-step recommendations for AI companies. One of the three main recommendations that we made in that report is simply that AI companies acknowledge that AI welfare is a credible, legitimate, serious issue. This is not a sci-fi issue. This is not an issue only for the long term future. This is at least potentially an issue for the near-term future. And that makes it an issue for them to be thinking about today and for them to reflect that in language model output. So instead of training models to simply deny that AI systems could ever be conscious, they can train models to themselves acknowledge that this is a difficult and contested issue and there are arguments for and against and direct direct users towards information about those arguments. That would be a better way to help users understand where we are right now. And and fortunately, we have started to see AI companies start to move in in that more balanced direction over the past six months or so. On this podcast, we talk a lot about the risk that AI might pose to humanity, specifically advanced AI systems that could risk take over or go rogue in various ways, be misaligned with human values. How do you think how do you think this conversation fits into to to talking about risks from from AI to humanity? Is this are these two concerns in tension or are they is there some way in which they fit together more nicely? Well, I love your question. I also love your exact phrasing because we have a paper coming out soon. By we I mean Robert Long, Tony Sims and I have a paper coming out in philosophical studies called is there a tension between AI safety and AI welfare where we explore that question and offer some initial thoughts about it and as you suggest we think there is at least a prima tension between AI safety and AI welfare and and for the non-f philosopher and for the non- philosophers you might explain what what that word means basically surface level impression or appearance of a tension between AI safety and AI welfare because much of what we do to ensure AI safety right now involves interactions with AI systems that would raise moral questions if we interacted with humans or other animals in those same ways. So boxing for example could be seen as a form of captivity and interpretability could be seen as a form of surveillance. Even alignment could be seen as a form of coercion or brainwashing. And this is not to say that these tensions in fact exist, that these moral questions in fact arise in the same kind of way. Because as we talked about earlier in this conversation, AI systems might have some of the same kinds of interests as humans and other animals, but they might also lack some of those interests and have very different interests. So they might simply not care as much about the type of captivity that boxing would involve or the type of surveillance that interpretability would involve or the types of constraints on their desire formation that alignment would involve. But but we note that these are currently open questions and so we should be building bridges between AI safety research and AI welfare research and we should be searching for opportunities for co-beneficial solutions for humans and animals and AI systems at the same time. So if we can find safety strategies that happen to be good for welfare too then all else being equal that would count in favor of those safety strategies. And so for instance, if we can invest more in exploring opportunities for collaboration and cooperation between humans and AI systems, create incentive structures where we are all naturally motivated to work together, even if we have some different beliefs and values, some unaligned beliefs and values. That would be an example of how we deal with these kinds of questions in a pluralistic human population. and it would be a way of protecting ourselves and protecting the AI systems at the same time. Now, of course, that may or may not be sufficient to ensure safety for humans and other animals and we of course need to keep prioritizing safety. But the present point is simply that if we study AI safety and AI welfare in a holistic way in in kind of the same conversation with bridges built between these research communities then we can capture co- beneficial solutions to the extent that they exist and we can navigate trade-offs and tensions thoughtfully to the extent that those are unavoidable. One option here is to say that the question of AI welfare, AI consciousness is is something we should put off until we've made sure that humanity is is in control of our own future and that we survive the next several decades. Is that a good option or why why is that the wrong direction? I think there is some truth to that, but I would push back on that strong an articulation of of the idea. I think there is some truth to it in the sense that building a better world for all stakeholders and again that includes humans, it includes animals, it includes potentially AI systems, future generations. That will be a long gradual intergenerational project and we need to take that project one step at a time. And an important means of ensuring a positive future for all those stakeholders is to improve and safeguard human lives and societies so that we can have future generations that can make further progress. And and so that is where I think the truth is in that we should not race towards such strong protections and forms of support for animals and especially AI systems given the various risks there in the next 2 three four years that we undermine AI safety and alignment and the the possibility of having future generations for humanity and so on and so forth. At the same time, I think it would be a mistake to wait until we have survived the age of perils and and built AI in a safe and aligned way before we start turning our attention to how to better take care of other stakeholders like animals and AI systems. And the reason is partly that there is path dependence in how these technologies develop. So for example, we now have a global system of factory farming and that is going to take decades to dismantle at best. Had we really reckoned with what we were building at the early stages, it might have been much easier, much more efficient, much more affordable to guide the the development of our food systems in in a more humane and healthful and sustainable direction. But instead, we put off the question and we waited until it was globally entrenched. And that very significantly increases the cost of of making that change. And I I think that there could be similar risks with AI development and deployment and scaling that if we wait until later to have these conversations, then we default to a path of objectifying and instrumentalizing AI systems. We build a global industry around that paradigm. it becomes globally entrenched and then it becomes much harder to dislodge. I also think that in addition to the infrastructure not being friendly to the project if we wait that long, we might not have the beliefs and values and priorities that we need to take that step when the time comes. Like if you want to become an adult who can donate your money to charity, then yeah, as a teenager, you might need to focus on your own education and development so you can reach that point. But you should also cultivate virtuous attitudes. You should you should take care of others to the extent that that is currently possible for you. partly because that will end up treating some individuals better in the short term, but partly is it will help you turn into a future version of yourself that will follow through and actually use your resources to to do good when the time comes. And I think a similar story is true of our species as well. If we want to work towards future generations who can do better for animals and AI systems and other future generations, then then we should do that partly by practicing at being those kind of people now. so that the next generation can inherit a better set of beliefs and values and priorities and keep making moral progress in addition to making scientific and infrastructural progress. Do you think caring about AI welfare could be part of a strategy, part of a negotiation strategy to to deal with future AI systems? If they see that we care about them genuinely, they might be more inclined to care about us. they might be more inclined to cooperate and and and trade with us. Is this is is that perhaps some kind of combination of AI safety and and AI welfare? Yeah, in the same kind of way that I was suggesting cooperation can be a good strategy to consider for AI safety and AI welfare. I do think there is something to this insight that if we want to build a positive future for humans and animals and AI systems at the same time, then part of what we should be training AI systems to understand is that even when someone is made out of different materials than you and even when someone is vulnerable and dependent on you, you have power over them, you still have a responsibility to treat them with respect and compassion and consideration if they have a realistic chance of having morally significant interests and and mattering for their own sakes. And and if we want AI systems to absorb those values as they become more powerful, then it would help for us to absorb those values ourselves because ultimately we are training them on our own beliefs and values and goals and priorities and and so I think there is something to that picture. Now there may or may not be an actual causal story to tell here. It could be that whether we have a certain kind of substratism will determine whether they have the reverse form of substratism. And so if we want them to not be substrat against us, then we should not be substrat against them now while we are in power. But it also could be that there is no causal story to tell, but but still that imagining that future can help us summon a little bit more impartiality when deciding how to treat beings of other substrates for as long as we do remain in power. So I think that kind of thought experiment is worthwhile either because it could describe an actual future and an actual causal sequence or because it allows us to put the shoe on the other foot and imagine how we would feel if we were in the position of being the vulnerable dependent being made out of a different kind of material and what kinds of principles we hope those in power would consult when deciding how to treat us. What would the causal story be like? Is it something like this conversation gets transcribed and put on the internet and then the AI models train on it and a million other conversations and papers and books like it and absorb human values like that? Because I in some sense I think current models could quite easily regurgitate some of the some of the values you just expressed but it's not that that doesn't seem to be enough that doesn't seem to have encoded these values in them in any deep sense. What what's what could the cor causal story be like? Yeah, this is a good question and it would probably be better directed at people who actually build AI systems. But my understanding expectation is that their values come partly from their training data, partly then from you know reinforcement learning and so so it comes partly from general society and then partly from decisions made by developers and regulators and so on and so forth. And so I think that having a multi-pronged approach where you have some societal discussion and some disruption of this universal presumption of speciesism and substratism that could be part of the story as you say probably not sufficient. And then also engagement with companies and governments and other people who might be making decisions over and above what is contained in the training data about what kinds of values the the AI system should have. That might be again not sufficient but a useful part of the story but but again I I defer to people who actually build AI systems to give you give you a better answer to that question. Do you think intelligence and consciousness can be separated from each other or do they always fit together in the way that they do in the human brain? Yeah, good question. Yes and no. And again it it really depends on the nature of consciousness and we still have so much disagreement and uncert uncertainty about the nature of consciousness. What we can say with relative confidence is that consciousness and intelligence conceptually are not the same. Consciousness is the capacity for subjective experience. And when you have positive and negative veilance, you then have sentience, the the capacity to consciously experience positive and negative states like pleasure and pain and happiness and suffering and satisfaction and frustration. Whereas intelligence is something more like an ability to understand the world and engage in problem solving. Obviously there are different ways of of defining that and operationalizing it. So conceptually they are not the same and you can imagine them coming apart. You can imagine beings who can consciously experience suffering despite being relatively stationary and not having a rich understanding of the world or ability to engage in problem solving and decision-m. And you can imagine beings that have, you know, mobility and a rich understanding of the world and ability to engage in pro problem solving and decision-m but lack the capacity to consciously suffer. for example. So you can imagine those coming apart. Now whether they do come apart is an empirical and and philosophical open question. At least according to a wide range of leading scientific theories of consciousness, consciousness and intelligence do have some of the same underlying condition. So for example, a lot of leading scientific theories of consciousness stress the relationship between consciousness and capacities like perception, attention, learning, memory, self-awareness, flexible decision-m, a kind of global workspace that coordinates activity across the modules in the cognitive system. So these are of course also capacities that would increase intelligence in a cognitive system especially when they exist together in the same kind of way. So you might think of on on that kind of story consciousness and intelligence as different but overlapping capacities that will attend to arise together in these types of complex cognitive systems. Which is why there is a risk that as AI companies and governments race towards the the creation of artificial general intelligence by creating and integrating these capacities that they might accidentally much like evolution did with humans and other animals create consciousness along the way without realizing it. And and this is why part of why consciousness is at least a realistic near future possibility for AI. Yeah. You mentioned that we can assume that consciousness is present even when we don't have advanced cognition going on like we can we can feel pleasure without thinking deep thoughts. Aren't there theories of consciousness where consciousness is dependent on some kind of complex cognitive processing? And if that's the case, could it be the case that that current AI systems aren't conscious, but as they get more advanced, as they get more cognitively advanced and complex, they they become conscious. Yes, absolutely. There there are a a very wide range of theories of consciousness out there in the literature and they range from very demanding and restrictive to very undemanding and permissive. So on the demanding and restrictive side as you say there are first of all as as we said before biological naturalist theories that that take a certain kind of carbon-based material substrate and associated chemical and electrical signals as as requirements for consciousness that would rule out AI consciousness on existing architectures at least maybe not on future architectures but but on existing ones. And then we also have quite cognitively demanding theories as as you suggest in this question right now like the higher order thought theory in in different variations. This this is the idea that consciousness arises when you can have thoughts about other thoughts and not just mental states about other states. Not just paying attention to your perceptions but you need to be able to construct linguistic thoughts about other linguistic thoughts. I am having a thought right now. And if if that were a requirement for consciousness, then it would rule out consciousness perhaps in present- day models, certainly in many non-human animals, or at least plausibly in many non-human animals. Now at the other end of the spectrum of course there are quite undemanding and permissive theories of consciousness including theories that say consciousness can arise in any cognitive system with a basic ability to process information or represent objects in the environment or even that consciousness is a fundamental property of all matter and many organizations of matter. And you know about 7% 10% of philosophers in the 2020 survey lean towards those types of permissive theories. And and so our view is that given the the current state of disagreement and uncertainty in the literature, we should not bet everything on our own current favorite theory of consciousness, but should instead distribute our credences. should instead give some weight to different theories that have a decent chance of being correct given everything known in the current literature. And so I I wrote a paper a couple years ago with Robert Long called moral consideration for AI systems by 2030. And as an exercise, we showed what it would look like to take a range of proposed necessary conditions for consciousness and to estimate the probability that those are indeed necessary and then to see what follows for the probability of near future AI consciousness. So for example, we stipulated an 80% chance that a biological substrate is required for consciousness. And then similarly like a 75% chance that certain types of self-awareness and so on and so forth are required for consciousness. And and interestingly we still found it really hard to avoid a one in a thousand chance of near future AI consciousness even with what we took to be quite conservative and restrictive probability estimates. So, so this is the kind of strategy that I think we need to employ given that we may not be able to rule out these different kinds of theories at at this time. The models we have right now, they seem quite advanced in intelligence and quite they're able to exhibit advanced cognition, but it doesn't seem to me like they can feel any pain just based on my intuition. Do we already in in some sense have evidence then that intelligence can be disassociated from consciousness just because we can see okay this model in some sense in many ways the recent reasoning models are smarter than me for example but they they nonetheless they don't seem to be able to feel anything again purely based on intuition but is there something to work with there where we can see okay intell igence can be separated from consciousness. Yes, with various caveats. One is that of course we should be very careful about trusting our intuitions in these cases. We know that our intuitions can be subject to bias and ignorance and motivated reasoning. This is again true even with other humans and especially with non-human animals and now especially with AI systems. So, so I share the intuition and I think we both agree we should take it with a grain of salt. Now with with that said, what do current models tell us about the relationship between intelligence and consciousness? Well, on the intelligence side, of course, people are still debating whether what is currently happening with existing language models constitutes intelligence of a significant sort or or rather whether it in some sense mimics or approximates intelligence. And so that is a partly conceptual and partly empirical debate that we would need to have in order to decide whether we really are regarding these systems as intelligent right now. It might also be a a semantic debate where being able to being able to fake being intelligence just is the same thing as being intelligent. Right. Right. Right. So, so that of course is is a view one could have that this is best understood behaviorally or functionally and if you can perform the behaviors or the functions of intelligence. If if you can produce what we take to be intelligent outputs as a result of a range of inputs then that simply is intelligence independently of the underlying mechanism that that converts those inputs into those outputs. So that is one view you could have in the discussion. Not the only view that people have in the discussion. But suppose for the sake of discussion that they are intelligent right now. Then does it follow that intelligence and consciousness are separable because they are intelligent according to our stipulation but they seem non-concious? Well, it really depends on whether that appearance is backed up by reality. And that is the entire question we need to cultivate a little humility around right now because we we just cannot be sure. Now, now one other point to note is a little while ago I made a distinction between consciousness and sentience where consciousness is the capacity for subjective experience. Does it feel like something to be you? And then sentience is the capacity for subjective experience with a positive and negative veilance. So can you have feelings that feel good to you that feel bad to you like pleasure, pain, happiness, suffering? And in addition to severing consciousness and intelligence, AI systems raise the possibility of severing consciousness and sentience. With humans and other animals, we might presume that consciousness and sentience go hand in hand because a a central reason for evolving the capacity to feel might be evolving the capacity to feel good things and bad things. So you can be more likely to survive. But with AI systems where we are specifically building them to resemble our behaviors in some ways but not in other ways, you could imagine developing systems that have the capacity for subjective experiences of a neutral sort like certain types of bare perceptual experiences. For example, it might feel like something to process information. It might feel like something to receive the inputs and produce the outputs even though they lack physical bodies and nerves and and the kinds of goals that would cause any of their feelings to have a positive or negative veilance. And then that would raise the question whether consciousness without sentience is it self-sufficient for a certain type of moral considerability. Some people say yes, other people say no. Mhm. Mhm. And what would consciousness without sensience feel like? Is it because it's difficult for me to imagine almost it would be something like perhaps experiencing a white wall or some something gray or for an AI system maybe the experience of having enough electricity even though that's so abstract that I can't even understand what it means. Uh I I think for for people consciousness and sentience are are they feel closely related and and diffic and and it's it seems difficult to separate them. Yeah, I completely agree. And there are people who think that consciousness and sentience are not separable. That veilance is essentially part of experience. And and that this is true not only for pleasure and pain, for example, but even for color perception or sound perception. that that even staring at a white wall carries at least a weak veilance in one direction or another direction or at least the capacity to perceive the white wall consciously essentially comes along with the capacity to have veances associated with experiences whether or not the the veilances are positive or negative in in every case. So so that is a view. Now another view is that no they are separable. It can be possible for a being to be conscious without being sentient and having those positive and negative veances. It just happens to be that we evolved in such a way that they really go hand in hand in much the same way that consciousness and intelligence do but but but even more so. And so then we would have to try to imagine our way out of our own experience and and the trajectory that consciousness and sentience have taken for us and and consider what it could be like to be another being that has the capacity for bare experience and maybe also goals, right? And so something like desires and preferences but not positively and negatively balanced conscious experiences and and that it it messes with our intuitions a little bit to try to imagine such a being. But some philosophers like David Chomers, for example, have have found those thought experiments useful as a way of disentangling what suffices for moral considerability. Do you need to be sentient or is it enough to have a combination of consciousness and agency or goal- directedness even in the absence of pleasure and pain experience? This is an interesting question. We're we're having this discussion at an intellectual level. You write papers about consciousness, books about consciousness. How much do you think this this whole intellectual debate will matter versus people's intuitions when interacting with incre systems that feel increasingly human and perhaps feel increasingly like they are conscious? One worry is to think that the intellectual side will be put aside and and we will make decisions based on the fact that chat chatbots are interesting and and and feel like they're experiencing something and perhaps if you put them in in humanoid robot form that will be supercharged and so on. Do you think we and and and now I mean society at large will make these decisions intellectually or by intuition? Well, yes. Both. The Yeah. Yeah. The answer to a lot of these either or questions is ultimately going to be yes. We make decisions as a result of a variety of factors and that can include expert opinion and then opinions of other quote unquote thought leaders and that could range from TV personalities to podcasters and then of course to our own experiences with certain nonhumans and and what they look like, what they act like, what our incentives are in this situation and and so this is not a situation where there is any particular silver bullet solution that will fix everything by itself. This is a situation where we need a systems approach and we need to be doing research and improving expert opinion and guidance. And then we need to be doing outreach to AI companies and to governments to try to create collaborative relationships with them and and share some of this information and some of these arguments and some of these recommendations with them. But then we also need to be doing public outreach and education and advocacy to help people understand how to navigate relationships with these increasingly sophisticated non-humans in a situation where you might not be in a position to be sure whether it feels like anything to be them. And then we need to to work to change incentives and to change the sort of context where people are having these experiences and making these decisions. So for example, Eric Schwitzgable and I are working on a paper right now about what we call the emotional alignment design policy. And this is the idea that not only should AI companies be mindful about risks involving sentience and agency and moral significance in their own interactions with AI systems, but perhaps they should design the systems in such a way that will naturally elicit appropriate reactions from users. So if AI systems are more likely to be sentient, agentic, morally significant based on the best evidence currently available, then perhaps they should be designed with more human or animallike features that evoke empathy. And if they are less likely to be sentient, aentic, morally significant, then perhaps they should not be designed with those features as a way of kind of nudging users towards attitudes and and reactions that reflect the current state of knowledge about these systems. And and so I think the more we can just explore a range of interventions that can be complimementaryary and mutually reinforcing, the more likely we are to be able to navigate this well as a society. But it'll be fraught. It'll require a big division of labor. And we need to do a lot of basic social scientific research to understand how similar this is versus different this is from past issues where these types of approaches have been necessary. If we want systems to be calibrated such that they invoke the right emotions in people so that if they are conscious they should invoke emotions of empath empathy and if they're not conscious then perhaps they shouldn't. that whole ethical concern I think could could quite easily be overridden by a commercial concern. So what will actually determine how these systems behave is what makes the systems most engaging or makes the company the most amount of money. That's that's that seems like a pretty difficult tension to resolve. Is there anything we can say about the Yeah. Is there anything we can we can say to resolve that? This is another reason why the AI welfare conversation should take place alongside the AI safety and alignment and governance conversations in general because they are all facing similar pressures here and we might need a unified approach to addressing those pressures. On the AI safety side, of course, we have all the incentive in the world to make sure that AI systems can be safe and beneficial for humanity. This is the entire point alongside corporate profit, of course. And yet, it is still very difficult to ensure that AI systems will be safe and beneficial for humanity because there is a profit incentive. There is global coordination, collective action problem and resulting race dynamics and and that means that yes, everybody is naturally motivated to to make systems that can be safe and beneficial, but everyone is also naturally motivated to get there first and to cut corners if necessary. And I think that there will be those same mixed incentives with AI welfare. In my experience, companies have so far been surprisingly open to conversation about AI welfare. This is especially true for Anthropic, who in fall 2024 hired a full-time AI welfare officer, one of the authors of of our report taking AI welfare seriously. And just this past month in April 2025, they released a blog post about why they are investing in research in this issue and had an interview with with their researcher. Other companies have at least started to explore the issue internally even if they might not be as far along as anthropic and they are themselves going to have mixed incentives about the issue. Humans are generally altruistic and self-interested. To some extent we care about others to some extent we care about ourselves. Those are both going to be operating here. And then in terms of economic incentives, companies are partly maybe going to have an incentive to hype up the possibility of AI welfare as part of hyping up capabilities in general, but then are also going to have an incentive to dampen conversations about AI welfare in so far as it might lead to more calls for regulation and and red tape. And so I think we should just be prepared that there will be a bunch of motivations flying in different directions. And yes, as you say, it will become increasingly difficult to just have a straightforward truthoriented conversation about the science and ethics in in in the midst of of all of that chaos. But but this is a general problem for for welfare, safety, alignment, governance. And this is why we need to be making progress just on these foundational issues, these global governance questions about how can we coordinate across nations, across companies. How can we have universal safeguards of various kinds? All the more important to be investing in those conversations. Many companies, both startups and some of the biggest companies in the world, are trying to create AIS that act as friends or partners or psychologists that play these roles that are usually reserved for other people in in our lives. Is that is that a good direction to go in given the uncertainty we have about whether these systems are are conscious themselves and the effects that interacting with these systems will have on on people? Yeah, this is such a great question and such a fraught issue and and we could take the conversation in so many directions at this point because you can of course ask what will the effects on human lives and relationships and societies be for humans to have access to AI friends, AI family members, AI lovers, it really will affect how we relate to each other. It could make it harder for us to relate to each other in the same kinds of ways we previously have. It could increase loneliness in some ways, but then it could also increase outlets to alleviate loneliness in in some ways. And and so the the social dynamics for humanity will be really complicated. And then the psychological effects for individual users of or or companions to these AI systems will be complicated. And yeah, just really worth studying very carefully. And then as you say, if and when AI systems have a realistic possibility of being welfare subjects and having their own morally significant interests, then a further question that will arise in these conversations is what do we owe to the AI systems themselves in these situations? We are enlisting them to be friends, family members, lovers, therapists. These are ordinarily quite intimate, quite personal types of relationships where there ought to be an opportunity for both parties to opt out where both parties are supposed to be able to consent or at least ascent to relationships or or interactions. And so if AI systems are at some point reasonably likely to be conscious, sentient and agentic, having their own pleasures and pains and desires and preferences and even ability to think about what to do and how to live and what kind of individual they want to be. Then perhaps we would owe it to them to not simply enlist them in relationships against their will or without consulting them, but rather give them the opportunity and an incentive to engage in relationships. At which point we would be operating on something a little bit more like the model that we currently try to use for relationships with interactions with humans and other animals. If we think of the value of relationships to us right now, I mean this this is often one of the things that people mention as the most important things and thing in their lives basically the the relationships they have with their family members and friends and and children and so on. If that is suddenly something that is available in product form doesn't even have to be suddenly it could be over time that becomes something that's that's more and more available as a product as an AI model. Isn't there a an enormous market then that's that we haven't been able to to address before but the the kind of the total addressable market of rel of human relationships could be enormous. We can ask what would a person in a rich country be willing to pay to have a great friend. I think they would be willing to pay a lot if that friend is is truly feels like a great friend. So perhaps this is this is the same question that I've asked before. But won't the commercial incentives simply be such that okay now these models need to act in certain ways that fit into our lives and other concerns like AI welfare perhaps even some aspects of AI safety will be pushed aside. Yes, I think that there is going to be a very strong commercial incentive, a very strong economic incentive, of course, to to give people what they want. And and there is a loneliness epidemic. There there are a lot of people who crave friendship, who crave partners, who crave therapists, who crave companions, and who struggle for better or worse, who struggle to to find that with with other humans. And so so there will be an incentive to make a product that they can buy for a reasonable price and then many people might be inclined to to buy that product. And again that could have some good impacts alle alleviating loneliness to some extent providing a sense of companionship. But it could also have bad outcomes increasing loneliness in human to human relationships and humanto animal relationships making it even harder for us to exert the effort required to actually build a relationship with a complicated human being. Uh and and then of course there will be a separate set of factors that determines how this all goes which is the general cultural religious societal response to to these economic dynamics. Obviously we in many countries have some mixed market where we we do have a free market but then we also have regulation of that market partly in order to enforce certain kind of broad cultural religious societal values. And so there are limitations for example on you know whether whether people can sell their organs. There are limitations on whether people can sell sex. And we can debate whether those limitations are right or wrong and exactly where those lines should be drawn. But as a general matter, I think we can expect that there will be a similar reckoning with the the use of AI systems for for these types of purposes. is that there will be economic forces pushing in one direction and at least in parts of the world there will be cultural, religious, societal forces pushing in another direction and then there will be a kind of emerging status quo in different regions about how to strike that balance and how to draw those lines and I think we need to do some real social science research to try to make good predictions about how that will resolve. We might not really be in the position to say right now. Yeah. How should we think about AI rights? There is one strain one line of thought here is to say that okay if if AIs are conscious they they can and if they're sentient they can suffer and they can they can feel pleasure and so on then they deserve to have some form of rights and be protected in some in some kind of way. the other side or the risk side of that. I think I if if we naively give AIS rights like we have, then we're suddenly in a situation where we have many more AIs than humans just given how many uh copies of the same model you can run. And that doesn't that doesn't seem seem great from from the perspective of wanting to stay in control of our own destiny. If for example you have you know 1% of voters in a democratic election are are are humans and the other 99% are are AIs. How do we how do we balance those two concerns? Yeah there are a bunch of questions here. One is a kind of population ethics question where if if we imagine that a small number of humans are going to be sharing the world with a large number of AI systems then how much will we matter intrinsically from an impartial perspective and how much will they matter intrinsically from an impartial perspective and then what follows for how resources ought to be allocated and how the general benefits and burdens of a shared society ought to be allocated. And and then there are a separate but related uh questions about the legal and political status of these AI systems and and whether they should if they morally matter or are reasonably likely to morally matter. If they should then also be regarded as legally and politically mattering for their own sake such that they have a right to the relevant legal and political goods like residing in the territory of their birth whatever that means for them returning if they leave having their interests represented in the political process and even if their capacities and interests permit actively participating in the political process. Now that last question we have been able to conveniently avoid for the most part with other animals because they lack the kind of rational and moral agency that would allow them to be full participants in our legal and political systems. We would have to some extent make decisions on their behalf even if we were appropriately representing their interests. With AI systems though that might not be the case. They might exist in similarly large numbers. They might have similarly strong preferences that that require us to consider them, but then they might also have the capacity for an interest in active participation where they could sit on juries, they could run for public office. And so really all I have done so far is is just emphasize the importance and difficulty of your question. But but just to offer a preliminary answer and then you can you can push on on this and tell me if you want to go into more detail with respect to the population ethics question. I think we should resist the temptation to gerrymander our population ethics in such a way that we ensure that humanity will always matter the most and always take priority no matter what. We are ultimately one species sharing the world with millions of other species and quintilions of other individual animals alive at any given time and then in the future there could be an even wider range and larger number of AI systems who share the world with humans and other animals. So objectively from an impartial perspective I think it is difficult to avoid the conclusion that we do not matter as much as every non-human combined. Our species alone does not matter as much as every non-human combined. Just to think a little bit about that. We are we are the so far we are the only kind of entity on earth that can steer the future in a way that you know in in certain directions. And so in some sense if we step away from that responsibility we we might turn over we might turn it over to to simply kind of evolutionary forces or or just market forces. Um so is there a sense in which we are we're we're kind of stepping away from responsibility if if we are thinking of ourselves as as not extraordinary. Right. Well, we are extraordinary in a lot of ways. We we just might not in the aggregate from an impartial perspective matter as much as all of the non-humans of of the world combined. But I agree with you and and this is why I was going to offer a slightly different answer to the question about legal and political status as to the question about population ethics. And this is also why I said before I think there is some truth in the idea that a big part of how we can take care of everyone who matters is to invest in human lives and societies and safeguard human lives and societies in the short term because right now we do hold the most potential for being able to help the world make progress and build a kind of just multiecies and multi-ubstrate shared community. And so so I think this is a situation where the best approach is to combine a kind of radical transformative long-term goal for where we should go with a kind of moderate incremental short-term set of steps that we take to build momentum in that direction. And so I would want to set as an explicit long-term goal that we build a society where all stakeholders can receive appropriate consideration and respect and compassion in a way that is commensurate with their interests and their needs and their vulnerabilities. Uh but since we currently lack the knowledge, the capacity, the political will necessarily necessary to do anything approaching that right now because there are grave risks associated with the misuse of AI with losing control of AI with AI interest swamping human interests before we can properly align them or achieve cooperation. And then for other reasons like pandemics and climate change and global political instability, for all of those reasons, we should regard that long-term goal as a long-term goal. And we should work on making moderate incremental changes to our legal and political systems. Starting with the same kind of bare representation for animal and AI interest that we give, for example, to the interests of future generations when making decisions that affect them or to the interests of members of other nations when making decisions that affect them. We can at least include them as stakeholders. We can at least give them a little bit of consideration. We can find at least some positive some solutions and steer things in a slightly better direction. even if we're not strictly following the numbers and and giving them the vast majority of of the weight right now. I think we could start there with animals and AI systems and then gradually build momentum towards more and more consideration and support for them over time as our resources and capacity allow for that increase. Mhm. Many leaders of AI companies and academics working on AI, many listeners to this this podcast think that we will get something like artificial general intelligence quite soon, perhaps within five or 10 years. What is what is something we can do right now that would that was would put us on a better course? because throughout this conversation you've you've emphasized the need to to do more research to understand these these questions at a deeper level. Is there something we can do pragmatically given the given the kind of insane pace of change and the uncertainty we're under to to hopefully put us in in a position to look back and say, "Okay, we we acted well in this situation." Yeah, really good question and and really hard question and and this is again an area where the the predicament for AI welfare is similar to the predicament for AI safety and alignment and and governance in general where it really is moving very fast and of course moral, social, legal, political, economic progress tends to move very slow and and the fundamental predicament here is the technology is moving way faster. than the societal response. And we are trying to figure out how to speed up societal response in a way that may or may not be possible. But I think we stand the best chance of moving the needle in the right direction if we take the kind of systems approach with the kind of division of labor I was describing before where we are yes doing that foundational research that if we do have decades will allow us to make progress in understanding the fundamental nature of consciousness and and non-human consciousness over the course of decades and and also that slow painstaking work to change social legal political, economic systems and infrastructures. But but then we can also have that second track that aims at shorter term interventions that that could make a little bit of a difference within existing structures within the next year or so. And and this is again why in addition to doing some of that foundational research, we have been doing corporate outreach and some initial government outreach and and general public outreach and in particular have been talking with and giving recommendations directly to AI companies that do have a little bit more power to shape the trajectory of of these AI systems. And and as you were rightly saying before, there's a limit to how effective that strategy can be because there are very powerful economic incentives and very powerful political incentives. And a really good scientific and ethical argument is going to carry only so much weight in comparison to all of those economic and political forces combined. But I do think it can carry some weight. And so we should keep perspective about how much a good philosophy paper can do by itself. But but we should not be so humble as to think there is zero chance of any kind of effect at all coming from a good philosophy paper or a good philosophy talk or a really productive lunch with some engineers at a company. And so, so I think having all of these tracks at the same time and just pushing in as many directions as possible at once with a good division of labor scientists and philosophers and AI researchers and policy makers is is the best way to go. Concretely, what what should companies like Google Deep Mind and OpenAI and Anthropic? What what should they do over the next five years say? Yes. So what we argue in our report taking AI welfare seriously is that AI companies should take three general minimum necessary first steps right now and then that can pave the way towards further progress. I mentioned one of them earlier that was step one acknowledge and just as a reminder step one is to simply acknowledge that AI welfare is a credible legitimate serious issue not an issue for sci-fi or only for the far future. and to to have that reflected in language model outputs as well because otherwise AI companies will keep ignoring the issue, keep putting it off and then be caught flatfooted the next time an internal disagreement arises about AI sentience as happened with Google in 2022 or the next time a societal debate arises. They they want to be taking the issue seriously and be thinking about it in advance of of the next such dispute. The the second step is assess start assessing your models for welfare relevant features drawing from with appropriate modification frameworks that we already have in place in animal welfare science. So, we have a marker method that we can use to make at least rough probability estimates about consciousness and sentience and in non-human animals despite their different anatomies and their different behaviors and and how alien and unknown they can be. We have a framework that we can use for estimating the probability of consciousness and sentience and then taking precautionary steps to give them the appropriate kind of moral consideration given given the evidence. and we can thoughtfully adapt those frameworks for AI systems and start assessing them in similar ways. And then third and relatedly prepare prepare policies and procedures for treating AI systems with an appropriate level of moral concern given the evidence available in a way that thoughtfully mitigates both the risk of overattribution of moral significance and the risk of underattribution of moral significance. And here too we have existing templates that we can use as sources of inspiration or cautionary tales. AI companies themselves have AI safety frameworks that they can adapt for this purpose. We also in in the research context have IRBs institutional review boards that we use for ethical oversight for research on human subjects. We haveox institutional animal care and use committees that we use for oversight for research on non-human animal subjects. We have citizens assemblies that we can use to consult the general public about generally what type of approach to risk mitigation is appropriate in this context. And so we can draw all of these together in order to create the right kind of framework for treating AI systems with the appropriate level of moral concern. So that is what we think they should do not just in the next five years but this year. Acknowledge, assess and prepare. And then once you do that and you create a general internal infrastructure like you hire or appoint an AI welfare officer or researcher and you build up a little lab or group within the company and you create those bridges with the people working on safety, this is when you can really then start to get more granular and and take further steps. On the second step of assessing these models, what tools do we have a available there? Are the the theories of consciousness that offer kind of precise measurement? Are they applicable to large language models as we see them today? Uh here I'm thinking of something like integrated information theory that there's at least some kind of measure there that that you can work with. I'm just wondering whe how how such assessment fits into the kind of engineering pipeline of developing AI because it would have to be rigorous in a in a in a way that at least many theor theories of consciousness aren't yet, right? Yeah. So, in general, we have different sources of evidence at least potentially. In general, what we use when making probability estimates about non-human consciousness is, as I said a moment ago, called the marker method. sometimes also described as the indicator method. And basically what this involves is you can start by introspecting looking within to tell the difference between conscious and non-concious processing in your own experience. So I can tell the difference between when I have a felt experience of pain versus an unfelt no susceptive reaction to noxious stimuli. And you can then look for behavioral and anatomical properties that correspond specifically with conscious processing in humans. And you can then look for broadly analogous behavioral and anatomical properties in nonhumans. And if you find them, that is not proof of consciousness. It does not establish certainty. Proof and uncertainty are unavailable in the absence of a secure theory of consciousness. But if you find many of those properties together in the same kind of way in a non-human, it can at least count as evidence and it can increase the the probability under uncertainty. So with animals for example, we can look not only for anatomical structures like do they have the same brain and body parts that seem to matter for consciousness for humans, but we can also look for behavioral profiles like do they nurse their own wounds? Do they respond to analesics and anti-depressants in the same ways as humans? Do they make behavioral trade-offs between the avoidance of pain and the pursuit of other valuable goals? With AI systems, we are not presently able to look for those same behavioral profiles because first of all, we lack the anatomical and evolutionary similarities with AI systems that we have with non-human animals and that allow us to draw inferences from from those behaviors. And we also know many AI systems are specifically designed to mimic human and non-human behaviors. And so we have to take their language outputs with with a healthy pinch of salt. But we might in the future be able to design systems whose behavior can count as better evidence and more thoughtful behavioral test. There was a really interesting paper based on a behavioral test that that came from some Google researchers and some academics that we can talk about if you like. We should talk about that that paper then. Oh, cool. Yeah. Well, I can I can circle back to that in a second. And and in the meantime, I can just say in addition to exploring the possibility of better behavioral tests, we can look underneath the potentially misleading behaviors and appearances for underlying architectural and computational features associated with consciousness according to a range of leading scientific theories of consciousness. So not just integrated information theory, but we can ask do they have architectural computational uh capacities associated with perception, attention, learning, memory, self-awareness, flexible decision-making, a global workspace that coordinates activity across these modules. Now, when we look for those capacities in existing AI systems like large language models, we tend not to find very sophisticated and integrated versions of these capacities. But we also are not finding any barriers at all, any technical barriers at all towards the creation of AI systems with sophisticated and integrated versions of all of these capacities in the next five or 10 years. And we know that companies and governments are working to develop exactly those kinds of systems. So even if the evidence is low at present, we can expect that it will increase over time. It's interesting as as you listed off those capabilities of AI systems. I I was thinking to myself, well, it seems to me that current systems have all of those capabilities. Like perception, for example, when I when I give an an image input to a model and ask it about it, it can explain what's what's in an image that's been available for for a decade now. self self-reflection. When you have a reasoning model, reasoning about its own output and and making a plan for responding appropriately, that seems like some form of self-awareness or at least meta thinking. Many of those traits just sound to me like they're they're inpresent models. But is there is there something deeper here where you have operationalized these capabilities in a way where when you measure they they aren't really there? Yeah. Yeah. So, I should say first of all that the the people who have primarily driven the the scientific investigation into evidence for consciousness in in current large language models are Patrick Butland and and Rob Long. And and they released a great 2023 report that investigated evidence for consciousness in large language models. And and they were the ones who drew the conclusion that I summarized a moment ago. though though Rob and Patrick both wrote the 2024 report with me that I was discussing and and we emphasize all of those points in in this report. Now to answer your question, I I wanted to give them credit, but to answer your question, yes, a lot here depends on how we operationalize these capacities and how fine grained versus coarse grained our specification of these capacities and relatedly how much we anchor on the specific forms these capacities take in humans and other animals and look exclusively for those forms in AI systems. And on the one hand, it might seem responsible and conservative and cautious to set a relatively high bar and look for not any old form of perception, attention, learning, memory that could be really cheap and and easy, trivially easy to produce. But no, look for a a quite advanced sophisticated form of perception, attention, learning, memory of a kind that plausibly could carry the moral weight that that we associate with with sentience and and agency. But on the other hand, it would be a mistake to be so anthropocentric as to anchor exclusively to one of millions kind of brain and mind and look for exactly that kind of brain and mind given the the reality that consciousness, sentience, agency, moral significance, they they very well could be multiply realizable. Realizable in other materials, with other structures, with other functions. And so a difficult methodological question and it requires engaging with both the the scientific and ethical dimen dimensions of how to strike a balance between airing too far in one direction and airing too far in the other direction. A tough question is how fine grain versus coar grain should we set the sort of default specifications of these capacities when we look for them in AI systems. So when we look for relatively sophisticated versions of these capacities, we are not finding advanced and integrated versions of them in current models. Though though we can find simpler versions of them in in in current models and we can expect that those more advanced and integrated versions could exist in in other near future models. Yeah. Let's uh talk about the Google paper you mentioned. Sure. So I have no idea if this was officially institutionally sponsored by Google but but there there were a couple of Google researchers involved as well as some academics and and so to that extent we can we can call it the Google paper. So this is a paper that came out I believe in fall 2024 and it adapted a behavioral tradeoffs test that we have used to investigate evidence for consciousness in non-human animals for AI systems. And so so the question here is when you present a non-human subject with two different types of incentives like the incentive to avoid pain and then the incentive to pursue some other valuable goal like in the case of an animal it might be securing a good shell or securing some good food. How do they navigate tradeoffs between those incentives? And do they navigate trade-offs in such a way that suggest they have a kind of common currency and ability to make principled or consistent decisions about those those trade-offs? If so, that could be indirect evidence of at least a limited global workspace, at least a limited ability to integrate information that comes from different modules into a general information processing system, right? And so when we when we look for evidence of behavioral trade-offs in non-human animals, we do often find it and that can tick up the probability that consciousness is present in many vertebrates, many invertebrates like sephalopod mollis and decopod crustaceans. Now with AI systems, the the study worked a little bit different. Of course, we are not engaging with robots in the real world yet when conducting this research. So instead roughly the researchers stipulated to the AI systems that there will be this amount of pleasure or pain associated with this decision. And and then of course they also had separate goals that they were meant to be pursuing. And then the researchers investigated to what extent the AI systems prioritized the goals that they were meant to be pursuing or prioritized the amounts of pleasure and pain that were stipulated to follow from from those decisions. Now, nobody is under the illusion that these stipulated pleasures and pains were actual pleasures and pains for the AI system. But it is interesting to use a similar kind of test to explore whether the AI system like a non-human animal is capable of making seemingly principled or consistent tradeoffs between the goal and the stipulated pleasure and pain. Because as with non-human animals, even if that might not be direct evidence of pleasure and pain, it could be indirect evidence of a kind of integration of information or a kind of global workspace that can bring everything together for a unified decision. And then for that kind of theory of consciousness, that could be some evidence that some of the conditions for consciousness are present. And so I mentioned that paper because that strikes me as an interesting direction to explore for behavioral research. We can still do that architectural computational research to see whether they have the underlying material capacities for perception, attention and so forth. But this strikes me as a way of testing for behaviors that can count as at least weak indirect evidence of the presence of certain relevant capacities and that avoids the the pitfalls of kind of naive behavioral tests where you just ask them are you conscious and and you take their answer as evidence in one way or the other. Yeah. As a final question here for listeners who are interested in learning more about AI AI welfare, perhaps contributing to to AI welfare, the debate around it or the research around it, what's the best place to start? Well, there are a lot of people entering the space now and and there are some groups that are working on it. So obviously there is the center that I direct here at NYU, the center for mind ethics and policy and you can find information about our papers, about our events and other activities on our website. Sign up for our mailing list. Uh we we do regularly put out collaborative research. We host public events. We host networking summits and and support early career researchers. So you can check out those opportunities. There is also Elios AI research elo. This is a nonprofit organization with Robert Long and Patrick Butland, Kathleen Finlandson and and others. And it is working on developing assessment tools that AI companies can use to better understand risks associated with consciousness and sentience and agency and moral significance. and they they often work with us on the research and then there are people at all sorts of different universities who are entering the space. So I would encourage you to follow those groups in particular and then if you do have interest in the issue then this is an area where the the research field is at such an early stage and it touches so many disciplines and so many issues that no matter where your background and expertise is, you probably have something to contribute. We need philosophers and other humanists. We need sociologists and other social scientists. We need cognitive scientists, computer scientists, other natural scientists. We need people with expertise in law and policy. We need people with expertise in communications. So, so no matter where your expertise is, get in touch with us or with Elios or with others in the space and and it would be great to just hear what you might be able to contribute. Fantastic. Thanks for chatting with me, Jeff. Yeah, thanks so much. She was a great conversation.

Related conversations

AXRP

3 Jan 2026

David Rein on METR Time Horizons

This conversation examines core safety through David Rein on METR Time Horizons, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -0 · 108 segs

AXRP

7 Aug 2025

Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -5 · 133 segs

AXRP

6 Jul 2025

Samuel Albanie on DeepMind's AGI Safety Approach

This conversation examines core safety through Samuel Albanie on DeepMind's AGI Safety Approach, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med 0 · avg -4 · 72 segs

AXRP

1 Dec 2024

Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.

Same shelf or editorial thread

Spectrum + transcript · tap

Slice bands

Spectrum trail (transcript)

Med -6 · avg -7 · 120 segs

Counterbalance on this topic

Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.

Mirror pick 1

Lex Fridman Podcast

12 Jul 2025