Library / In focus
AXRPCivilisational risk and strategy
AI's Future and Impacts with Katja Grace

Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from AXRP. Editorial summary pending review.
Perspective map
MixedGovernanceMedium confidenceTranscript-informed
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
StartEnd
Across 117 full-transcript segments: median 0 · mean -1 · spread -21–17 (p10–p90 -8–0) · 3% risk-forward, 97% mixed, 0% opportunity-forward slices.
Slice bands
117 slices · p10–p90 -8–0
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- Emphasizes safety
- Emphasizes ai safety
- Full transcript scored in 117 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation.
ai-safetyaxrp
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video yalAfcqELls · stored Apr 2, 2026 · 4,139 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/ais-future-and-impacts-with-katja-grace.json when you have a listen-based summary.
Show full transcript
hello everybody today i'll be speaking with katie race who runs ai impacts a project that tries to document considerations and empirical evidence that bears on the long-term impacts of sophisticated artificial intelligence we'll be talking about her paper when will ai exceed human performance evidence from ai experts co-authored with john salvatier alan defer baobao zhang and owen evans as well as ai impact's other work on what the development of human level ai might look like what might happen afterwards for links to what we're discussing you can check the description of this episode and you can read a transcript at axrp.net katya welcome to excerpt hey thank you right so i guess my first question is can you tell us a bit more about what ai impacts is and what it does yeah i guess there are basically two things uh it does where one of them is sort of do research on these questions that we're interested in and the other is try to organize what we know about them in in a accessible way so the questions we're interested in are these kind of high-level ones about what will the future of uh well humanity but involving artificial intelligence in particular look like um like will there be an intelligence explosion will uh is there is there a huge risk of um humanity going extinct that sort of thing but also um like those are questions we kind of know about it seems like there's a kind of vaguer question of just like what do the details of this look like are there interesting details where if we knew exactly what it would look like we'd be like oh we should do this thing ahead of time or something yeah we're interested in those questions they're often pretty hard to attack head on so um what do we do is try to break them down into lower level sub questions and then break those down into sort of lower level sub questions again um until we get to questions that we can actually attack um and then answer those questions so often we end up answering questions like um you know for cotton gin technology uh were there fairly good cotton gins just before eli whitney's cotton gin or something uh so it's very all over the place as far as subject matter goes but all ultimately hoping to bear on these other questions about the future of ai okay so and and how do you see it fitting into the ai existential risk research field like if i if i'm someone else in this research field how should i like interact with ai impacts outputs yeah i think having better answers to the high level questions uh i think should often influence what other people are doing about the risks um like i think there are a lot of efforts going into avoiding existential risk and like broadly better understanding what we're up against seems like it might change which projects are a good idea it seems like we can see that for for the the questions that are most clear like is there likely to be a fast intelligence explosion well if there is then maybe we we need to prepare ahead of time for anything that we would want to happen after that whereas if it's like a more slow going thing that would be different but i think there's also just an intuition that like if you're heading into some kind of danger and you just don't really know what it looks like at all it's often better to just be able to see it better i think the the lower level things that we answer i think it's harder for other people to make use of except to the to the extent that they are themselves trying to answer the higher level questions uh and putting enough time into that to to be able to make use of some sort of you know data about cotton gins um yeah i i guess there's this difference between the higher level and the low lower level things with a higher level of things perhaps being more specific or more more relevant rather but yeah in terms of like your output if somebody could read a couple things on the website or you know just just a few outputs i'm wondering like specifically what work do you think is most relevant for people trying to ensure that um there's no essential catastrophe caused by ai i think the bits where we've sort of got closest to saying something at a higher level and so more relevant um is we have a page of arguments for thinking that there might be fast progress in ai at around the point of uh human level ai sort of distinct from an intelligence explosion i think often there's a thought that like you might just get a big jump in progress at some point or like you might like make a discovery and go from a world that's similar to the world we have now to one where fairly impressive ai is either starting an intelligence explosion at that point or just doing really crazy things that we weren't prepared for so we tried to make a list of all the arguments we've heard around about that and like the counter arguments we can think of i think that's one that other people have mentioned being particularly useful for them i guess relatedly we have this big project on discontinuous progress in history so that that's feeding into that um but i guess separate to the list of arguments we also have like just historically when have technologies seen big jumps what kinds of situations cause that how often they happen that sort of thing well we will be talking about that later so i'm i'm glad it's relevant so it seems like you have like a pretty broad remit of um things that are potentially items that ai impacts could work on how do you you or how does ai impacts choose like how to prioritize within that and um which sub questions to answer yeah i think at the moment the way we do it is we basically just have a big spreadsheet of um possible projects that we've ever thought of that seem like they would bear on any of these things and i guess the last time we went to a serious effort to choose projects from this list what we did was had a sort of like first everyone put numbers on things for how much they like them kind of like intuitively taking into account how useful they thought they would be for like informing anything else and like how well equipped to carry them out they thought we were or they were in particular um and then we had like a sort of debate when we looked at ones where the where like you know someone was excited and other people weren't or something like that and discussed them which was quite good i think i would like to do more of that yeah like i think it was pretty informative um and then we yeah i guess in the end we're basically going with an intuitive combination of like this is important we think it's like tractable for someone currently at the org to do it well and maybe some amount of like either we have some obligation to someone else to do it or we know that someone else would like us to do it like sometimes people give us funding for something okay what do you think the main features are that predict your judgments of importance of one of these sub questions i think like how much it might change our considered views on the the higher level questions if you had to extract like non-epistemic features so an epistemic feature might be like this would change my opinions and a non-epistemic feature might be like this involves wheat in some capacity you know my sense is that um projects are better when they're adding something empirical to the community's knowledge like i think you can do projects that are sort of more vague and like what are the considerations about this and i think that can be helpful i think it's harder for it to be helpful or especially if it's like and what do i think the answer is to this given all the things because i think it's harder for someone else to trust that that they would agree with your opinions about it whereas if you're like look this thing did happen with cotton gins and other people didn't know that and it's relevant then however it is that they would sort of more philosophically take things into account they have this additional thing to to take into account i guess that's kind of interesting given that so so you've said that the thing that is most relevant perhaps for outsiders is this uh big document about um why we might or might not expect fast takeoff and yeah it's mostly like a list of kind of arguments and counter arguments i'm wondering how you yeah what do you think about that relationship i think considering that as a project i wouldn't class it with the like unpromising category of like more vague open-ended non-empirical things uh so i guess i guess my thing is not empirical versus not i think the way that these arguments seem similar to an empirical thing is like they're in some sense like a new thing that the other people reading it might not have known and now they have it and can do what they want with it like it's somehow like you've added a concrete thing that that's sort of modular and people can take away and do something with it rather than the output being somehow more nebulous i guess yeah i guess the thing that um i try to do with air impacts at least that i think is less common around in in like similar research is the the pages are organized so that there's always some sort of takeaway at the top um and then the rest of the page is just meant to be like supporting that takeaway but it should be that you can basically read the takeaway and do what you want with that and then look more like read the rest of the page if you're curious about it okay so when when you're kind of trying to figure out what the future of ai holds one thing you could do is you could think a lot about progress within ai or things you know facts about the artificial intelligence research field another thing you could do is you could uh try to think about like how economic growth has accelerated over time or how efficient animals are at flying or like progress in cotton gin technology i'm wondering how do you think like these feel like different types of uh considerations one might bring to bear and i'm wondering like how you think we should uh weigh these different types i guess thinking about how to weigh them seems um like a funny way to put it or something like i guess the the reason or there's some sort of structured reason that you're looking at either of them like there was some argument for thinking that ai would be fast it would be dangerous or something and i guess in looking at that argument if you're like oh is this part of it correct then like in order to say is it correct there are different empirical things you could look at like so if if a claim was like ai is going to go very fast because it's already going very fast then it seems like the correct thing to do is to look at like is it already going very fast what is very fast or something whereas if if the claim is like uh like it's likely that this thing will see a huge jump then it seems like a natural way to check that is to be like all right do things see huge jumps um is there some way that ai is different from the other things that would make it more jumpy um so there i think it would make sense to look at things in general and look at ai and the reason to look at things in general as well as ai i guess maybe there are multiple reasons one of them would just be like there's a lot more of other things like especially if you're if you're just getting started in ai and the relevant kind of ai or something it's nice to look at broader things as well but there there i agree that like looking at ai is pretty good yeah i i guess maybe what i want to get at is um i think a lot of people when trying to like think about the future of ai like a lot of people are particularly focused on like details about ai technology or like their kind of sense of um what will and won't work or you know some like yeah some theory about machine learning or something whereas it seems to me that ai impacts is does a lot more of this like latter kind of thing um where you're kind of comparing to every other technology or like um thinking about economic growth more broadly do you think that the people who were not doing that are making some kind of mistake and like what what do you think the the implicit i guess disagreement is or the what do you think they're getting wrong i think it's not clear to me it might be a combination of factors where one thing where i think maybe i'm more right is just i think maybe people don't have a tendency when they talk about abstract things to like check them empirically i guess a case where i feel like where i feel pretty good about my position is like uh well maybe this isn't ai versus looking at other things it's more like abstract arguments versus looking at a mixture of ai and other things but it seems like people people say things like probably ai progress you know goes for a while and then it reaches mediocre human level and then it's like rapidly superhuman and that also this is what we see in other narrow things um as far as i can tell it's like not what we see in other narrow things like you could look at lots of narrow things and see that you know it often takes decades to get through the human range and i haven't haven't systematically looked at like all of them or something um but at a glance it looks like it's at least quite common for it to take decades like chess is a good example where as soon as chess as soon as we started making ai to play chess it was roughly at amateur level and then it took like i don't know 50 years or something to get to a superhuman level and it was sort of a gradual progress to there and so as far as i can tell people are still saying that uh progress tends to jump quickly from subhuman to superhuman and i'm not quite sure why that is but checking what actually happens seems better i i don't know what the counter argument would be but i think there's another thing which is just like i'm not like an ai expert in the sense that like my background is not in machine learning or something um i mean i knew a bit about it but like my background is in human ecology and philosophy so i'm like less well-equipped to do the very like ai intensive type projects or to like oversee other people doing them um so i think like that's one reason that we tend to do other like more historical things too okay so now i would like to talk a little bit about this uh this paper that you were the first author on it's called when will ai exceed human performance evidence from ai experts uh it came out in 2017 and i believe it was like the something like the 16th most discussed paper of that year according to some metric yeah altmetric okay so yeah that's that's kind of exciting um could you tell us a bit about what's up with this like why this survey came into existence and what it is and how you did it let's start with uh let's start with the what what what is this survey in paper well the survey is a survey of i think we we wrote to i think it was everyone who presented nips in icml in 2015 we wrote too um so i believe it would have been all the authors um that i can't actually remember but i think i think our protocol is just to like look at the papers and take the email addresses from them yeah so we we wrote to them and asked them a bunch of questions about like when we would see sort of human level ai described in various different ways and using various framings to see if they would answer in different ways like how good or bad they thought this would be whether they thought there would be like some sort of very fast progress or progress in other technologies um what they thought about safety what they thought about like how much progress was being driven by hardware versus other stuff probably some other questions that i'm forgetting uh we sort of randomized it so they only got a few of the questions each um so that it didn't take forever um so the more important questions lots of people saw and the ones that were more like you know one of us thought it would be a cool question or something a few people saw and i guess we got maybe i think it was like a 20 response rate or something in the end we tried to make it so they wouldn't see what the thing was about before they started doing it just in term that it was sort of like safety and long-term future oriented to try and make it less biased okay and why why do this um i mean i think knowing what uh machine learning experts think about these questions seem pretty good i think both for like informing people like you know informing ourselves about what uh what people actually working in this field think about it but also i think sort of creating common knowledge like you know if we publish this and they see that like yeah a lot of other machine learning researchers think that um pretty wild things happening as a result of ai pretty soon they're likely you know that that might be helpful um i think another way that it's been helpful is just often people want to say something like this isn't a crazy idea for instance lots of machine learning researchers believe it and it's helpful if you can point to a survey of lots of them instead of pointing to like scott alexander's blog post that has several of them listed or something yeah so the there's a paper about the survey and i believe um ai impacts also has a web page about it and one thing that struck me reading the web page was there was first of all there was a lot of disagreement about um when ai when human level ai might materialize but it's also the case that like a lot of people thought that they were that basically most people agreed with them which is kind of interesting and suggest some utility of like correcting that misconception yeah i agree that was interesting yeah so i guess maybe the survey didn't create so much common knowledge as or maybe it was more like common knowledge that we don't agree or something yeah and i guess i don't know if anyone's attempted to resolve that much beyond us like publishing this and maybe the people who were in it seeing it so one i i guess like the the question that um the first question i think people want to know is like when are we going to get human level ai so when does survey respondents think that we might get human level eye well they thought very different things depending on how you ask them like i think asking them different ways which seem like they should be basically equivalent to me just get extremely different answers two of the ways that we asked them uh or we asked them about high-level machine intelligence um which is sort of the closest thing to the question that um people have asked in previous surveys which was something like um when will uh unaided machines be able to accomplish every task better and more cheaply than human workers but we asked them this in two ways one of them was like for such and such a year what is the probability you would put in happening by that year and the other one was like for such and such a probability in what year would you expect it so then we kind of combined those two for for the kind of headline answer on this question by sort of making up curves for each person and combining all of those so doing all of that the answers that we we got from combining those questions were uh 50 chance of ai outperforming humans in all tasks in 45 years but if you ask them instead about when all occupations will be automated then it's like 120 years and these are years from uh 2016. that's right yeah okay so i subtract five years for listeners listening in 2021. so there's this difference between the framing of um whether you're talking about tasks that people might do versus for money versus old tasks you also mentioned there's a difference in like whether you ask like probability by a certain year versus year at which it's that probability that the thing will be automated um what was the difference like what kind of um different estimates do you get from the different framings i think that one was interesting because we actually we we tried it out beforehand on mturkers and also we did it for all of the different narrow task questions that we asked so we asked about a lot of things like when will ai be able to like put lego blocks together in some way or something um and and this was like a pretty consistent effect across everything and i think it was it was much smaller than the other one i think it usually came out at some something like a decade of difference does anyone know why that happens i don't think so which one got the sooner answers if you say when will there be a 50 chance that gets the sooner answer than if you say what is the chance in 2030 or something my own guess about this is that people would like to just put low probabilities on things um so if you say what is the chance in any particular year they'll give you like a low probability whereas if you say when will there be a 90 chance they have to give you a year and they don't feel an urge to give large numbers of years as answers also somehow seems unintuitive or weird to like say you know the year 3000 or something yeah that's that's just speculation though i'd be pretty curious to know whether this is just like a a tendency across lots of different kinds of questions that aren't ai-related it seems like probably since it was across a lot of different questions here yeah you have these like big differences in framing you also had differences in like whether the respondents grew up in north america or asia right i can't remember if it was where they grew up but um yeah roughly whether they're yeah it was where their undergraduate institution was i think yes it was like a 44 year difference where for those in asia it was like 30 years asking about hlmi again for i guess it was uh like 74. okay here here as well do you have any guesses as to why there are these like very large differences between groups of people who like like it's pretty common for people from different countries to have their undergraduate in one continent and go to a different continent to study so it's not as though like a word hasn't reached various places do you have any idea what's going on i think i don't i i would have some temptation to say like well a lot of these views are you know more informed by like cultural things than like you know a lot of data about the the thing that they're trying to estimate so me or yeah to the extent that you don't have very much like contact with the thing you're trying to estimate maybe it's more likely to be cultural but yeah i think as you say there's a fair amount of mixing yeah i think another thing that makes me think it's not that it's cultural is just that um opinions were so all over the place anyway i think i guess i can't remember within within the different uh groups how all over the place they were but i think i would be surprised if it wasn't like that both cultural groups contain people who think it's happening very soon and people who think it's happening quite a lot later so it's not like people are kind of following other people's opinions a great deal okay so given that there are these large framing differences and these large differences based on people's undergraduate institutions the continent of people's undergraduate institutions should we pay any attention to these results i mean i do think that that those things should like undermine our confidence in them quite a lot i guess things can be very noisy and and still like some good evidence if you kind of average them all together or something like if we if we think that we've asked with enough different framings uh it's sort of varying around some kind of mean maybe that's still helpful so perhaps we can be confident that high-level machine intelligence isn't literally impossible i guess there are some things that that everyone kind of agrees on like that or maybe not everyone but a lot of people i think also there are more specific things about it like you might think that if it was you know five years away it would be surprising if like almost everyone thought it was much further away like you might think that if it's far away people probably don't have much idea how many decades it is but if it was like really upon us they would probably figure it out or more of them would notice it i'm more inclined to listen to other answers in the survey i i guess i feel better about asking people things that they're actually experts in so to the extent you can ask them about the things that they know about and then turn that into an answer about something else that you want to know about that sort of seems better so i guess we did try to do that with human level ai in a third set of questions for asking about that which were sort of taken from robin hansen's previous idea of asking how much progress you had seen in your field or your subfield during however many years you've been in your subfield then kind of using that to extrapolate how many years it should take to get to 100 of the way to uh human level performance in the subfield is the question for for that kind of extrapolation to work you need further not to be acceleration or deceleration too much and i think in robin's sort of informal survey a few years earlier there wasn't so much whereas um in our survey a lot of people had seen acceleration um perhaps just because the you know the field had changed over those few years so harder to make clear estimates from that i'm wondering why that result didn't make it into the main paper because it seems like pretty it seems like relatively compelling evidence right i think maybe people vary in how compelling they find it i think also like the the complication with acceleration maybe makes it hard to to say much about there it seems like you can still use it as like a band people also like if you did this extrapolation it it led to numbers much closer um than the ones robin had got i think part of the reason we included it or part of the reason it seemed interesting to me at least was it it seemed like it led to very different numbers if you ask people like when will it be hlmi they're like you know 40 years or something whereas if you say like how much progress have you seen in this in however many years and then extrapolate then i think it was ending up like hundreds of years in the future in robin's little survey so i was like huh it's interesting yeah but i think i think here it sort of came out reasonably close to the answer for just asking about hlmi directly though if you ignored everyone who hadn't been in the field for very long then it still came out hundreds of years in the future if i call yeah i'm wondering why you think there was this uh because it seemed like there was a bigger seniority bias or a seniority difference in terms of responses in that one and i'm wondering like what do you think was going on there yeah i don't know i guess acceleration could explain it right like if uh progress had been really fast recently then like the progress per year for people who have not been there very long would be very high but progress per year for people for ages would be low i think in the abstract that does sound pretty plausible we have to like check the actual numbers to see whether that like would explain the particular numbers we had but yeah i don't have a better answer than that i could also imagine more psychological kinds of things where like you know the first year you spend an exciting field it feels like you're making a lot of progress whereas after you've spent 20 years in an exciting seeming field maybe you're like maybe this takes longer than i thought okay so is it typically the case that people are like relatively calibrated about um progress in their own field of research yeah how surprised should we be if ai researchers are not actually very good at forecasting how soon human level machine intelligence will come i don't know about other experts and how good they are forecasting things my guess is not very good just based on my my general sense of people's ability to forecast things by default in combination with things like ability to forecast how long your own projects will take i think is clearly not good speaking for myself i make predictions in a spreadsheet and a label limits either to do with my work or not and i'm clearly better calibrated on the not work related ones huh are there not work ones about things to do with your life or about like things in the world that don't have to do with your life i i think they're a bit of both but i think that a main difference between the two categories is the ones to do with work or like i'm going to get this thing done by the end of the week and the ones not to do with work or like if i go to the doctor he will say that my problem is blah so like is it to do with my own volition yeah yeah one thing i've noticed so so i've i've done a similar thing and i've noticed that i'm like much less accurate in forecasting questions about my own life including questions like um at one point i was wrong about what a doctor would diagnose me with i lost a lot of muscular prediction points for that but but i think like part of the effect there is that um if i'm forecasting about something like uh how many satellites will be launched in the year 2019 or something like i can draw on a bunch of other people answering that exact same question whereas there are not a lot of people like trying to determine what disease i may or may not have i guess i almost never forecast things where it would take me much effort to figure out what other people think about it to put into the prediction i think or like i don't tend to do that so i think it's uh it's more between ones where i'm guessing yeah that seems fair a different thing about your survey is that you ask you ask people about the sensitivity of ai progress to various inputs right so like if you had halved the amount of computation available what that would do to progress or if you'd have like the training data available or the amount of algorithmic insights and it seems to be that one thing some enterprising listener could do is check like if that concords with work done by people like danny hernandez on scaling laws or something i guess the final thing i'd like to ask about this yeah the survey perhaps two final things the first is um aside from the things that i've mentioned what do you think the most interesting findings were that didn't quite make it into the main paper but are available online i mean i guess the thing that we've talked about a little bit but that i would just emphasize much more as like a headline finding is just that um the answer is it is very inconsistent and there are huge framing effects so i think it's it's both important to remember that when you're saying like people say ai in however many years um i think i think the framing that people usually use for asking about this is the one that gets the soonest answers out of all the ones we tried uh or at least or maybe out of the four main ones that we tried so that seems sort of good to keep in mind i think another important one was just that like the median uh probability that people put on outcomes sort of in the vicinity of extinction level of bad was five percent which seems pretty wild to me that like among sort of mainstream ml researchers like the median answer is five percent there does that strike you as high or low i think high i don't know five percent that like this project destroys the world is like or does something so really bad it seems unusual for a field they also had a pretty high probability that it would be extremely good right it's unclear if there's balance out yeah that's fair i guess as far as like should we should we put attention on trying to like you know steer it toward one rather than the other and it at least suggest there's like a lot at stake or or a lot um a lot of risk that the sort of like more support for the idea that we should be doing something about that then then i think i thought before we did the survey so now i'd like to move on to asking you some questions about ai impacts work on roughly takeoff speeds so takeoff speeds refers to this idea that in between the time when there's an ai that's about as good at humans perhaps generally or perhaps in some domain and the time when ai is much much smarter or more powerful or more dominant than humans that that might take like a very long time which would be a slow takeoff or it might take a very short time which would be like a fast takeoff um now impact has done i think a few bits of work relevant to this that i'd like to ask about so the first that seemed relevant to me is this question about how long it takes for ai to cross the human skill range at various tasks so there's a few benchmarks i found on the site so for classification of images on the imagenet dataset it took about three years to go from beginner to superhuman for english drafts it took about 38 years to go from beginner to top human 21 years from starcraft um 30 years for go and 30 years from chess and in go it's not just that um the beginners were like really bad even if you go from what i would consider to be like an amateur who's like good at the game to the top human that was about like 15 years roughly why do you think it takes so long for this to happen because there's this intuitive argument that like look humans like they're all pretty like similar to each other relative to like the wide range of cognitive algorithms one could have why is it taking so long to cross the human skill range i'm not sure overall but i think the the argument that like humans are all very similar so you know they should be in a similar part of the range i think that doesn't necessarily work because um uh i i guess if you have like like consider wheelbarrows like how good is this wheelbarrow for moving things like even if all wheelbarrows are basically the same design you might think there might be variation in them just based on like how broken they are in different ways so if you think of humans as being like like i don't i don't know if this is right but if you think of them as having sort of like a design and then various bits like being more or less kind of broken relative to that in the sense of like having mutations or something or like being more encumbered in some way then you might expect that there's there's some kind of like perfect ability to function that like nobody has and then everyone just has like going down to zero level of function just like different numbers of problems and so if you have a model like that then it seems like even if even if a set of things basically have the same design then they can have an arbitrarily wide range of um performance i i guess there's also some surprise that comes from the animal kingdom so like i tend to think of humans as like much more cognitively similar to each other than to all of the other animals right but like it seems i would be very surprised if like there's a species of fish that was better than my friend at math but worse than me yeah or like like if there was some like monkey that had like better memory than my friend but worse memory than like the the guy who is really good at memory um do you think this intuition is mistaken or do you think like um that despite that we should still expect like these pretty big ranges i mean i guess it seems like there are types of problems where like other animals basically can't do them at all and then humans often can do them so that's interesting or like that seems like one where the humans are clearly above their and i guess chess is one like that perhaps or i guess i don't know how well any animal can be taught to play chelsea seems like quite poorly aren't there chickens that can play checkers apparently i have heard of this i don't know i don't know if they're good it seems like my impression is that as soon as we started making ai to to play i think checkers or chess that it was similar to like amateur humans which sort of seems like amateur humans are kind of arbitrarily bad in some sense but maybe it's like there there's some there's some step of like do you understand the rules and can you like kind of try a bit or something and both the people writing the first ais to do this and you know the uh amateurs have just started doing it have got those bits under control um by virtue of being humans and i don't know the fish don't have it or something like that i think it's interesting that uh like you mentioned was it like imagenet was only three years or something all right i think i think that's very vague and hard to tell by the way like because we don't have good measures for people's performance at imagenet i think uh or that i know of and we did we did some of it ourselves and i think it depends a lot on like how bored you get looking at dogs or something but yeah it seems like it was plausibly pretty fast and it's interesting that imagenet is like or like recognizing images is in the class of things where humans were sort of more evolved for it and so like all of them are like or most of them are pretty good um unless they have like some particular problem i'm not sure what that implies but it seems like uh it wouldn't surprise me that much of things like that were different whereas like chess is a kind of thing where by default we're like not good at it and then you can sort of put in more and more effort to be better at it so the first thing i want to say is that as could probably be predicted from passing familiarity with chickens they're able to peck a checkers board or like perhaps to move pieces but they are not they're not at all good at playing checkers so okay good i would like to say that publicly um i wonder if the the claim is that like different people maybe the the the claim of like short small human ranges is something like that humans you know they're people who are better worse than a chess but like a lot of that variance can just be explained by like some people have bothered to try to get good at chess and some people like might have the ability to be really good at chess but they've like chosen something better to do with their lives and i wonder if there's instead a claim that like look if you sort of if everybody tried as hard as like the top human players tried to be really good at chess then that range would be like very small and may and somehow that implies a short i think that we probably have enough empirical data to rule that out for at least many activities um that humans do where there are enthusiasts at them but they never reach like anywhere near peak human performance yeah and and with the case of go like like i think every i think like maybe like good go uh for go enthusiasts i mean like the rank of one don um i think good at go is like roughly the range where maybe not everybody could get good go or you know i wouldn't expect much more out of literally everybody but it still takes 15 years to go from that to top human range this still seems like an extremely confusing thing to me i was going to ask why why it would seem so natural for it to be the other way that it's like very easy to cross the human range quickly uh something like yeah just the relative similar like humans just seem like more cognitively similar to each other than most other things so i would think like well you it can't be that much range you know or like you know human linguistic abilities right like i think that like there's no chicken that can speak they can learn languages like better than my friend but worse than me you know also chickens can't learn languages at all so it seems like or i assume pretty close to nothing yeah like it seems like for the language range or something uh depending on how you measured it um you might think like humans cover a lot of the range at least off until the top humans and then there are a bunch of things that are like at zero i mean there still are like um you know there are these like african gray parrots or something or like this uh this monkey thing can use sign language or this ape i guess was like gpd3 which is like not good in other ways yeah it still seems to me that like basically every adult can or perhaps every like developmentally normal adult it seems to me can um use language better than those animals that seems right i'm not certain i wonder if instead my intuition is that like humans are just like like the the range and learning ability is not so big but then that's further away from the question you want to predict which is like how long until like ai's can't destroy us until they can destroy us assuming that when there are ais that are smarter than any human they can destroy us which seems pretty non-obvious but yeah no i'm not assuming that i think the reason that people are interested in this argument are like one part of this argument that i'm trying to focus on is like how long until like you have something that's like about as smart as a human which like can probably not destroy the human race until you're like some degree of smartness at which you can just destroy all of humanity whether or not you might want to i guess super super human type thing yeah i still find this weird but i'm not sure if i have any good questions about it i agree it's weird i guess at the moment we're um we're doing some more case studies on this and i'm hoping that at the end of them to like sit down and think more about what we have found and what to make of it yeah what other case studies are you doing because um yeah the ones that have been looked at there's imagenet which is image classification there's a bunch of board games and one like criticism of this has been like well maybe that's like too few like too narrow range of tasks have you looked at other things yeah i forget which ones are actually up on the website yet i think we do have starcraft up probably um i think maybe that was like 20 years or something 21 years we have some that are further away from ai uh like clock stability like how well can you measure a period of time using like various kinds of clocks or like your mind uh where that one was about 3 000 years apparently according to our tentative uh numbers and three thousand years between like uh for our automated time measuring systems to go between the worst person at measuring time and the best uh according to our own somewhat flawed efforts to find out how good different people are at measuring time i think it wasn't a huge sample size we just we got different people to try and measure time in their head yeah and we tried to figure out i think how good editor professional drummers were at this something that seems shockingly long well this all happened quite a long time ago when everything was happening more slowly when did we hit human like superhuman timekeeping ability i think in the 1660s when we got well adjusted pendulum clocks according to my notes here but we haven't finished writing this and putting it up so i'm not sure if that's right all right so so you've looked at uh time keeping ability are there any other things that uh we might expect to see on the website soon frequency discrimination in sounds where it tentatively looks like that one's about 30 years speech transcription which is i think quite hard because there aren't very good comparable measures of human or ai performance yeah i will say with my experience with making this podcast and using ai speech transcription services it seems to me to be that the commercially available ones are not yet at the like daniel quality range well they do it they do it much faster yeah i guess there are also things that we haven't looked at but it's like not that hard to think about like like robotic manipulation it seems like has been within the range of like not superhuman at all sorts of things but probably better than some humans at various things for a while i think um i don't know huge amount about that and i guess like you know creating art or something uh sort of within the human range eh yeah that seems i mean it could be that we're just not sophisticated enough to realize how excellent ai art truly is but yeah my best guess is that it's um in the human range i'd next like to talk about um discontinuities in technological progress and this question of like arguments for and against fast takeoff so how common are discontinuities in technological progress and what is a discontinuity we were measuring discontinuities in terms of how many years of progress at sort of previous rates happened in in one go which is a kind of vague definition it's not quite clear what counts as one go but we're also sort of open to thinking about different metrics so if uh if it was like oh it wasn't one go because there were like lots of little goes but they happened over a period of ten minutes then we might consider the metric of like you know looked at every 10 minutes uh did this thing um see very sudden progress we're basically trying to ask like when is there a massive jump in technology uh on some metric and so and we're trying to explicate that in a way that we can measure like at a high level how often do they happen sort of like you know not never but uh it's not that common we didn't do the kind of search for them where you can easily recover the frequency of them uh we sort of looked for ones that we could find by asking people to report them to us and we ended up finding 10 that were sufficiently abrupt and clearly uh contributed to more progress on some metric than another century would have seen on the previous trend so like 100 years of progress in one go and it was sort of like pretty clear and robust um where there are quite a few more where maybe there were different ways you could have like measured the past trend or something like that are there any cases where people might naively think that there was a like really big quick progress or something where there wasn't actually i guess it's pretty hard to rule out there being a discontinuity of some sort because we're looking for like a particular trend that has one in so if if you're like you know the printing press usually the intuition people have is like this was a big deal somehow so if we're like uh was it a discontinuity and like how many pages were printed per day or something then like if it's not it might still be that it was in some other nearby thing i think a thing that was notable to me was like whitney's cotton gin i think yeah our best guess in the end was that it was like a moderate discontinuity but not like a huge one and it didn't look to me like it obviously had a huge effect on like the american cotton industry in that like the amount of cotton per year being produced was already shooting up just before it happened so that still does count as a discontinuity i think um but yeah it looks like much less of a big deal than you would have thought much more continuous i think uh maybe penicillin for syphilis was didn't seem discontinuous um in terms of the trends we could find and one reason it it didn't i think like there were some things where we could get clear numbers for like how many people were dying of syphilis and that sort of thing and there it's kind of straightforward to to say it doesn't look like it was a huge discontinuity it sounds like it was more clearly good on the front of like how costly was it to get treatment it seems like for the previous treatment people just weren't even showing up to get it because it was so bad bad in terms of like side effects side effects yeah and like but interestingly it was like nicknamed the magic bullet because it was just like so much better than the previous thing like over the ages syphilis has been treated in like many terrible ways uh including like mercury and getting malaria because malaria or i guess the fever helps get rid of the syphilis or something um and those are bad ways to treat syphilis i think they're both less effective and also quite unpleasant in themselves or they have bad side effects so the thing prior to um penicillin was kind of already quite incredible in terms of it working and not not being as bad as the other things i think and so so then even if uh penicillin was was quite good on that front comparably it didn't seem like it was uh you know out of distribution for the rate at which progress was already going and that sort of thing so i guess the reason you would be interested in this is you want to know like are big discontinuities the kind of thing that ever happens and if they are the kind of thing that ever happens then like you know it shouldn't take that much evidence to convince us that they'll happen in ai and so in assessing this like i could look at like the list of things that uh you guys are like pretty sure were discontinued discontinuous on some metric i think there's this additional page which is like things that might be discontinuities that you haven't checked yet and they're like a lot of entries on that page it's it kind of looked like to me i'm wondering if you could forecast like what's your guess as to like the percentage of those that you would end up saying like oh yeah this like seems quite discontinuous so i guess among the trends that we did look into like we we also sort of carefully checked that some of them didn't seem to have discontinuities in them so i guess for for trying to guess how many that would be in that larger set i think maybe something like that fraction where as if you were looking at these like very big robust discontinuities we found like 10 of them in 38 different trends where some of those trends were for the same technology um but there were like different ways of measuring success or something like that and and some of the trends like they could potentially have multiple discontinuities in but yeah so it's roughly something like 10 really big ones in 38 trends though that's probably not quite right for the the fraction in the larger set because i think it's like easier to tell if a thing does have a big discontinuity than to tell than to show that it doesn't probably although well maybe that's not right i think showing that it does is actually harder than you might think because you do have to find enough data leading up to the purported jump to show that it was worse just beforehand which is often a real difficulty or like something looks like a discontinuity but then it's not because there were things that just weren't recorded that were almost as good or something but yeah i guess you could think more about like what biases there are and which ones we managed to find i think the the ones that we didn't end up looking into i think are a combination of ones that people sent in to us after we were already overwhelmed with the ones we had and um ones where it was just like quite hard to find if i recall some of this work was done more in like 2015 so i may not recall that well i think my takeaway from this is that discontinuities are the kind of thing that like legitimately just do happen sometimes but they don't happen like so frequently that you should expect them for like for a random technology you shouldn't necessarily expect one but uh if you have like a decent argument it's like not crazy to imagine they're being discontinuity is that like a fair i think that's about right i'm not i'm not quite sure what you should think over the whole history of a technology but there's also a question of like suppose suppose you think there's like some chance that there's a discontinuity at some point like what's the chance of it happening at a particular level of progress it seems like that that's much lower yeah that seems right uh at least on priors yeah i guess i'd like to get a bit into this um yeah these arguments about foss takeoff for artificial intelligence and in particular this argument that like once you hit like it wouldn't take very long to the human range to go through the human range to like very smart like somehow like something about that will be quite quick and my read of the ai impacts page about arguments about this is that it's like skeptical of this kind of fast takeoff is that a fair summary of the overall picture i think that's right yeah i think there were some arguments where they didn't seem good and somewhere it seemed like they could be good with like further support or something and we don't know of the support but like maybe someone else has it or it would be like a worthwhile project to look into it and see if we could make it stronger i i'd like to discuss a few of the arguments that i think like i and maybe some listeners might have initially found compelling and we can talk about why they may or may not be i think one intuitive argument that i find interesting is this idea of recursive self-improvement right so like we're gonna have ai technology that's gonna the better it is sort of the higher that rates that makes the like rate of progress of improvement in ai technology and somehow like this is like just the kind of thing that's gonna spiral out of control quickly and like you know you get humans doing ai research and then they make like better humans doing better ai research and this seems like the kind of thing that would go quickly so i'm wondering like why you think that this might not lead to some kind of explosion in intelligence i do think that it will lead to more intelligence uh and sort of increasing intelligence over time um it seems like that sort of thing is already happening like like i'd say was sort of already in an intelligence explosion that's quite slow moving or these kinds of um you know feedback effects are sort of throughout the economy and so there's there's a question of should you expect like this particular one in the future to be like much stronger and take everything into a new regime and so i think it's i think you need some sort of further argument for thinking that that will happen whereas currently we can make things that make it easier to do research and we do and i guess there's some question of whether research is going faster or slower or you know hard to measure perhaps like the quality of what we learn but it's sort of like it's kind of like this kind of feedback is the norm and so i guess like i think you could make a model of these different things like a quantitative model and say like oh yeah and when we like plug this additional thing in here it it should get much faster i haven't seen that yet um maybe someone else has it yeah i i wonder like what the like one reason that this might that this seems a little bit different from other feedback loops in the economy to me is that um with ai like it seems like i don't know somehow like one feedback loop is you get better you get a bit better at making bread and then like everybody grows up to be like a little bit stronger and that makes you know that just everyone's a little bit more healthy or like the population's a little bit bigger and you have more people like who are able to innovate on making bread and that seems like a fairly like circuitous path back to you know improvement whereas like with machine learning it seems like making machine learning better is somehow like very closely linked to intelligence somehow like if you can make things generally smarter than one of the then somehow like that might be more tightly linked to improving ai development than than other loops in the economy and that's why this might um yeah this argument is less compelling now that it's out of my head than when it was in it but but i'm wondering if you have thoughts about like that that style of response all things equal it seems like tighter loops like that are probably going to go faster i think i i guess i'm yeah i thought through all of that but but i think also that there are kind of relatively tight ones where like for instance people can write software that immediately helps them write better software and that sort of thing and i think you could make like a fairly similar argument about like soft weary computery things that we already have and so far that hasn't been terrifying um perhaps but then you might say and we've already seen the start of it and it seems like it's a slow takeoff so far um like i don't necessarily think that it will continue to be slow or i think maybe just like continuing economic growth long term like it has been speeding up over time and if it hadn't slowed down in the 50s or something i guess maybe we would expect to see a singularity around now anyway so maybe yeah the normal feedback loops in technology and so on eventually do expect them to get super fast i think it's a different question of whether you expect them to go very suddenly from kind of much slower to much faster i guess i usually try to separate these questions of like very fast progress at around human level ai into like fast progress leading up to i guess maybe nick foster calls it crossover or something but like so at some point where the ai becomes good enough to help a bunch with building more ai do you see fast progress before that and do you see fast progress following that where the intelligence explosion would be following that and the discontinuity stuff that we've done is more about before that where i think like a strong intelligence explosion would be like as a visually weird thing that you know i wouldn't just say oh but other technologies don't see big discontinuities like i think that would be a reasonable argument to expect something different but for an intelligence explosion to go from sort of nothing to like very fast that seems like there has already been a big jump in something like it seems like ai's ability to contribute to ai research kind of went from meh to like suddenly quite good so part of the hope with the discontinuity type stuff is to address like should we expect an intelligence explosion to to appear out of nowhere or to more gradually get ramped up yeah i i guess the the degree of gradualness matters because it sort of tells you like like you sort of have a warning beforehand and maybe the ability to use like technologies that are improving across the board perhaps to like make the situation better in ways yeah whereas if it were just like totally out of the blue then like on any given day it could be tomorrow so we we just better be constantly vigilant i guess i'd like to talk a bit about the second argument that i'm kind of interested in here which is sort of related to the human skill range discussion so as is kind of mentioned evolution on an evolutionary time scale it like didn't take that long to get from ape intelligence to human intelligence and like human intelligence seems way better than ape intelligence at least to me um i'm biased maybe but we build much taller things than apes do so b is much more steel you know you know it seems like better in some objective ways or more impactful and like you might think from this you might conclude like well it must be the case that like there's some sort of like relatively simple like relatively discreet genetic code or something like there's there's this relatively like simple algorithm that like once you get it you're smart and if you don't get it you're not smart and like if there is such like a simple intelligence algorithm that like is just much better than all of the like other things then maybe like you just find it one day and on that day you have this like you know before you didn't have it and you were like 10 good and now you do have it and you're like 800 good or something um or 800 smart i'm wondering like uh so this is sort of related to it's maybe exactly the same thing as one of the considerations that you have that there are counter arguments to but i'm wondering if you could say like what seems wrong to you about that point of view i guess i haven't thought about this in a while but off the top of my head at the moment a thing that seems um wrong is is this kind of an alternate theory of how it is that we're smart which is more like we're particularly good at communicating and you know building up cultural things and other you know apes aren't able to take part in that and it more like gave us an ability to to build things up over time rather than like the the particular mind of a human being such that would do much better and i guess i'm not sure what to make of the like what that says about the situation with ai perhaps it means that you can't make strong inferences of the type like humans developed evolutionary quickly therefore there's a simple like smartness algorithm yeah so it sounds like to the extent that you think that you thought that uh human ability was because of like one small algorithmic change if it's the case that it was actually due to a lot of accumulated experience and greater human ability to communicate uh and you know learn from other people's experience that that would block the inference from human abilities to like quick takeoffs that seems right it seems like it would suggest that you could maybe like quickly develop good communication skills um but if an ai has good communication skills it seems like that would merely allow it to like join the human like giant pool of knowledge that we share between each other and like if if it's a bit smarter than humans or even if it's a lot smarter i think on that model like maybe it can contribute faster to the giant pool of knowledge but it's hard for it to like get out ahead of all of humanity based on something like communication skills but i also have a different response to to this which is you might think that whatever it is that like the way that humans are smart is sort of different from what monkeys are trying to do or something like it wasn't like evolution was sort of pushing for better uh like building of tall buildings or something it seemed more like like if we have to sort of fight on similar turf to monkeys like doing the kinds of things that they're uh you know have been getting better at like fighting in the jungle or something it's just like not clear that you know individual humans are are better than gorillas i don't know if anyone's checked uh so so you might have a story here that's more like well we're getting gradually better at some sort of cognitive skills and then at some point we sort of accidentally used them for a different kind of thing i guess an analogy is like a place i do expect to see discontinuous progress is where there was some kind of ongoing progress in one type of technology and it just wasn't being used for one of the things it could be used for and then suddenly it was like if after many decades of progress in chess someone decides to write like a program for schmess which is much like chess but like different then you might see schmess progress to suddenly go from zero to really good and and so you might think that something like that was what happened with human intelligence to the extent that it doesn't seem to be what evolution was optimizing for it seems like it was sort of more accidental but is somehow related to what monkeys are doing but does that not still imply that there's some like core insight to doing like really well at schmess that you either have or you don't maybe most of doing well at schmess is like the stuff you got from doing well at chess and so like they're it's like yeah and so there's some sort of key insight in like redirecting it to schmess but like if the whole time you're trying to you if you're starting from zero and trying to just do schmess it would have also taken you about as long as chess there i guess like in terms of uh other apes it would be like a lot of what is relevant to being intelligent they do have but somehow it's like not well oriented toward building buildings or something sure so one thing i'm interested about with this work is it's sort of relationship to other work so i think the most previous like somewhat comprehensive thing that was trying to answer this kind of question is intelligence explosion microeconomics it's unfortunate in that it's like trying to talk very specifically about ai but it was published like i think right before big deep learning became at all important but i'm wondering like how you see i'm wondering if you have thoughts about um that work by elias or you cassie and what you see is the relationship between what you're writing about and that i admit that i don't remember that well what it says at this point though i did help edit it as a as a i guess junior murray employee at the time what i mostly recall is that it was um sort of suggesting like a field of kind of looking into these kinds of things and i do think of like some of what ai impacts does at least as sort of hopefully answering that to some extent or like being work in that field that was being called for i guess a related question that you are perhaps in a bad position to answer is that one thing that was kind of notable to me is that you published this uh post and i think a lot of people still continued believing that there would be fast progress in ai but just like didn't really it seemed like the obvious thing to do was would have been to write a response and nobody really seemed to i'm wondering am i missing something i guess i i'm not certain about whether anyone wrote a response i don't think i know of one um but there are really a lot of written things to keep track of so that's nice yeah i think i don't have a good idea of why they wouldn't have written a response i i think maybe at least some people did change their mind somewhat as a result of reading some of these things um but yeah i don't think people people's minds change that much probably overall so i guess somewhat related to takeoff speeds um i guess i'd like to talk about just existential risk from ai more broadly um so you've been thinking about this recently a bit for instance you've written this post on coherence arguments and whether they imply that ai's or or whether they provide evidence that ais are going to be very like goal directed and trying to achieve things in the world i'm wondering like what are you yeah what have you been thinking about recently just about the broader topic of existential risk from ai i've been thinking a bunch about this specific sort of sub sub question of like is there's a reason to think that ai will be agentic where it's not super clear that it needs to be agentic further to be a risk but in the in the most common kind of argument uh seems like being agenda plays a role and so i guess things i've been thinking about there are i guess it's not like i'm pretty ignorant about this area i think other people have uh you know are much more expert and i'm just sort of jumping in to see if i can understand it and write something about it quickly for ai impacts basically but but then i sort of got um i got sidetracked in various interesting questions about it to me which seemed maybe resolvable i guess one issue is i don't quite see how the argument goes that says that if you're incoherent then you should become coherent it seems like logically speaking if you're incoherent i think it's a little bit like believing a false thing like you can make an argument that it would be good to change in in a way to become more coherent but i think you can also like you can sort of construct a lot of different things being like a good idea once you have like circular preferences or something like that so i think when i've heard this argument being made it's made in a sort of hand-wavy way that's like ah or i don't know if listeners might need more uh more context on what coherence arguments are yeah let's say that what what is a coherence argument yeah um i think like a uh commonly mentioned coherence argument would be something like a way that you might have incoherent preferences is if your preference is a circular say like if you want a strawberry more than you want a blueberry and you want a blueberry more than you want a raspberry and you want a raspberry more than you want a strawberry um and the the argument that that is bad is something like well if if you have each of these preferences then someone else could offer you some trade where you pay some tiny sum of money to get you want more and and you will go around in a circle and end up having spent money and so that's bad but the step where that's bad i think is kind of relying on you having a utility function that is coherent or something or like i think you you could equally say that having lost money there is good due to everything being similarly good or bad basically well hand but if you value money right it seems like this whole chain took you from some state and having some amount of money to the same state but having less money so doesn't that kind of clearly seem just bad on like what like if your preferences are structured such that in any state you'd rather have more money than less i guess i'm saying that if you're if your basic preferences say we're only these ones about different berries relative to each other then you might say like all right but you like you like money because you can always buy berries with money but i'm saying like yeah you could go that way and say and say yeah money equals berries but you can also say oh negative money equals is the same as like this cycle that i like or whatever i guess i've been thinking of it as like well you're having difference curves across different sets of things you could have and if you have incoherent preferences in this way it's sort of like you have two different sets of indifference curves and they just like cross and hit each other and then it's like just a whole web of things that are all equivalent or like in some way of like you know moving around the whole web that these are somewhat incoherent thoughts that are just like what have i been thinking about lately it seems like in practice if you look at humans say i think they do become more coherent over time like they do sort of figure out the the logical implications of of some things that they like or other things that they now think they like and so i think probably that does kind of work out but it would be nice to have a clearer model of it so i guess i've been thinking about what is a better model for incoherent preferences since it's not really clear what what you're even talking about at that point do you have one so tentative one which is well i guess it sort of got tied up with um representations of things so my tentative one is like instead of having a utility function you have something like a function that takes representations of states like it takes a set of representations of states and it outputs uh like an ordering of them and so the ways you can be incoherent here are like you don't recognize that two representations are of the same state and so you sort of treat them differently also like because it's just from subsets to orderings it might be that like if you had a different subset your your response to that would not be coherent with your responses to other subsets if that makes sense and then in in this model uh why would you become more coherent or would you yeah i guess the uh the kind of um dynamic i'm thinking of where you become more coherent is like some of the time you run into like i guess you're continually presented with options and some of your options affect what your future um sort of representation to choice function is or like i guess it doesn't really matter whether you call it yours or like some other creatures in the future but you imagine there are some choices that bear on this and so if i'm sort of deciding like what will i do in this future situation and i'm deciding it based on like if i currently have a preference like say i'm currently like i like strawberries more than raspberries or something and i see that in the future when i'm offered strawberries raspberries or bananas i'm going to choose raspberries instead um that that if i can do like the the logical inference that says this means i'm going to get raspberries when i would have wanted strawberries or something like i guess there's some amount of work going on where you're like making logical inferences that let you equate different representations with each other and then having equated them when you have a choice to change your future behavior or your future like choice function then some of the time you'll change it to be coherent with your current one or like you're one where you're making a different set of choices and so very gradually i think that would bring things in into coherence with one another is my vague thought yeah it's relatively easy to see how that should kind of iron out like dynamic inconsistency where your future self wants like different types of types of things than your current self one thing that seems harder to me is like yeah so suppose like right now if you say which do you prefer a strawberry or blueberry and i say i'd rather have a strawberry and if you say okay how do you feel about like strawberries blueberries bananas and then i say blueberries are better than strawberries which are better than bananas then when i'm imagining my future self it's sort of hard to say like it's not obvious what the force is pushing those to be the same right i think i was imagining that um when you're imagining the future thing like there are different ways you can represent the future choice and so you're kind of randomly choosing one between it so sometimes you do see it as like oh do i want like oh i guess i'm choosing strawberry over raspberry here or whatever it was um or sometimes you represent them in some entirely different way like sometimes you're like oh i guess i'm choosing the expensive one is that what i wanted huh or and so sometimes you randomly hit one where the way you're representing it your current representations of things are i guess sorry they're all your current ones but like when you're choosing what your future one will be and you're you're representing the future choice in a different way to the way you would have been at the time then you change what it would be at the time or or change what the output of the function is on the thing that it would be at the time which i think has some psychological realism um but yeah i i don't know if other people have like much better accounts of this but i guess you you originally asked like what i've been thinking about lately sort of in the in the area of sort of arguments about ai risk i guess this is like one cluster of things another cluster is just like what is the basic argument for thinking that ai might kill everyone which is like quite a wild conclusion and uh like is it a good argument what are counter-arguments against it that sort of thing so i could you could say more about that if you're interested yeah i am looking to talk about this a bit more so yeah what are what are your thoughts on why ai might or might not do something as bad as killing everyone i took the the basic argument that it might to be something like it seems quite likely that you could have super human ai ai that is like quite a lot better than any human i could go into details i guess i could go into details on any of these things but i'll say like what i think the high level argument is it seems pretty plausible that we could develop this relatively soon things are going well in ai if it existed it would it's quite likely that it would want like a bad sort of future in the sense that there's a decent chance they would have goals of some sort there's a decent chance its goals would not be the goals of humans and goals that are not the goals of humans are sort of bad often or like that's what we should expect if they're not the same as our goals that we would like not approve and so i guess i would put those you know it has goals they're not human goals non-human goals of bad all sort of under superhuman ai would by default want a future that we hate and then also if it wants a future that we hate it will cause the future to be pretty bad either via some sort of short-term catastrophe where maybe it like gets smart very fast and then it's able to just like take over the world or something or via just like longer term reduction of human control over the future so that would be like maybe via like economic competition or stealing things well but slowly that sort of thing so arguments against these things are like places this seems potentially weak i think super human ai is possible i guess in some ways it seems pretty clear but i think there's a question of like how much headroom there is in the most important tasks like there are clearly some tasks where there is not much um you can't get much better than humans just because like they're pretty easy for us or something like tic tac toe is an obvious one but but like you might think all right well if things are more complex though obviously you can be way better than humans i think that just sort of seems unclear like for particular things how much better you can be in terms of like value that you get from it or something yeah i mean i guess depends on the thing one way you could get evidence that there would be headroom above like present human abilities is if like humans have like continued to get better at a thing and like aren't stopping anytime soon so like for instance knowing things about math that seems like a case where humans are getting better and doesn't look like we've hit the top it seems like there you could distinguish between like how many things can you know and like how fast can you make progress at it or something or like how how quickly can you accrue more things it seems pretty likely you can accrue more things faster than humans though you do have to compete with humans with like whatever kind of technology it is like you know to use in some form but not in their own brains potentially like ai has to compete compete with humans who have calculators or have the best software or whatever in a non-agentic form yeah and i guess another task at which i think i think there's been improvement is like how well can you take over the world against people at like a certain level of being armed or maybe not the whole world i think that's like unusual but like at least take over a couple of countries like that seems like the thing a kind of thing that happens from time to time in human history and like people have kind of got better armed and you know more skilled at not being taken over i think in some ways and yet like people still keep on occasionally being take taken over which to me suggests like a degree of progress in that domain that i'm not sure is like flattening out that seems right or like i i definitely agree that there should be like a lot more tech progress it seems like again there's a question of like there's like how much total progress is there ever and there's like how fast or like the thing that humans are doing in this regard is sort of like adding to the tech and so there's a question of like how much faster than a human can you add to the tech seems pretty plausible to me that you can add to it quite fast though again like compared to humans who also have various non-aegentic technology it's like less clear and then it's like not clear how much of the the skills involved in taking over the world or something are like building better tech i guess you might think that like just building better tech alone is enough to get like a a serious advantage and then using your normal other skills maybe you could take over the world yeah it seems plausible to me that you can you can knock out this counter argument um but it just sort of needs better working out i think or it's like a place i could imagine later on being like oh i guess we were wrong about that yeah i guess like maybe moving on so so the first the first step in the argument was uh we can't have smarter than human intelligences i guess the next one was uh we might do it soon but yeah i guess that was just might anyway but i i don't have any um you know particularly interesting counter arguments to that except you know maybe not but yeah the next one after that was um if superhuman ai existed it would like by default threaten the whole future where that is made up of it would by default want a future that we hate and it would get it um or it would yeah destroy everything as a result and so is there there's like it has goals i think that's one where again i could imagine like in 20 years being like oh yeah somehow we were just confused about this way of thinking of things in particular it seems like we're as far as i know we're not very clear on how to describe how to kind of do anything without destroying the world it seems like like if you sort of describe anything as maximizing something maybe you get some kind of wild outcome it seems like in practice that hasn't been a thing that's arisen that much or like you know we do we do have like small pigs or something and they just go about being like small pigs and it's not very terrifying you might you might think that like these kinds of creatures that we see in the world it's possible to make a thing like that that is not like aggressively taking over the world i think people might be kind of surprised at this claim that it's very hard to like write down some like function that like maximizing it doesn't take over the world could you say a little bit more about that i mean i guess one thing to say is like well what would you write down i don't know like uh make me a burrito please seems like a thing i might want a thing to do it's not like uh destroying doesn't seem naively perhaps likely to destroy the world most time i often ask people to make breeders for me and maybe they're not trying hard enough but maybe the intuition behind this being hard would be something like well if you're if you're like maximizing that in some sense are you like making it extremely probable that you make a burrito or like making a breeder very fast or very very many burritos or something yeah i'm not sure what like what the strongest version of this kind of argument is um make the largest possible burrito maybe depending on i guess it sort of depends how you like code up what a burrito is i guess maybe there's some like if you try to write it as some sort of utility function it's like all right is it like you know one utility if you have a burrito and zero otherwise or something then it seems like maybe it puts a whole bunch of effort into making really sure that you have the burrito yeah to me that seems like the most robust case where like you always utility functions always love like adding more probability like it's always better but it seems like maybe in practice building things it is actually more like you can just say give me a burrito and like it turns out to be not that hard to have a thing get you a burrito without any kind of strong uh pressure for doing something else although you might think well all right sure you can make things like that but surely we want to make things that are more agent-like since there's at least a lot of economic value in having agents it seems like as in people who act like agents um you know often be paid for that as long as they don't destroy the world maybe even if they do slowly they can be paid for that you could imagine a world though where it's actually like quite hard to make a thing that's like a strong agent in some sense and that like really deep down everything is kind of like a um like a sex that's just responding to things it's like yep when i see a stop light then i stop and when i see a cookie on a plate i put my hand out to put it in my mouth and like you can have more and more elaborate things like that that sort of look more and more agent-like and me and you can probably even make ones that are more agent-like than humans and that is like destructive in ways or dangerous but it's not like they're sort of arbitrarily sort of almost magically seeming agent-like where they think of really obscure things instantly and destroy the world it's more like every bit of extra agentiness like takes effort to get and it's not perfect although on that fee you could still think like well we make things that are like way more agent-like than people and they have like the agency problems like way more than people have them and it's like yeah i guess in that case maybe you just think that it's like quite bad but not literally destroying the world and then you move on to something else that might destroy the world actually yeah something like that or like once you're in the realm of like this is like a quantitative question of what it's like and how well the the forces that exist in society for dealing with things like that can fight against them then then it's not like automatically the world gets destroyed at least um like if looking back from the future i'm like oh the world didn't get destroyed and why was that i think that's that's another class of things the broader class being something like um i'm just sort of confused about like agency and how it works and what these kind of partial agency things are like and how easy it is to do these different things so yeah it seems to be the one inside view reason for expecting like something like agency and ai systems is that the current way we build ai systems is we sort of are implicitly getting the best thing in a large class of models at like achieving some tasks so you have like neural nets that can have various weights and we have some like mechanism of like choosing weights that do really well at achieving some objective and you might think like well okay it turns out that like being an agent is like really good for achieving objectives and like like that's why we're going to get agents with goals i'm wondering what you think about that do you think there is a particular reason to think that agents are like the best things for achieving certain goals yeah i mean i guess the reason is like they're so like what do you do if you're an agent you sort of um figure out all the ways you could achieve a goal and do the one that achieves it best yeah it seems right and like that that's kind of a simple to describe algorithm both verbally in english and in terms of like uh if i had to write code to do it it's like not that long but to describe like the actual like to just like hardcode the best way of doing it would take like a really long time but like it's easier to program something like a very naive for loop over all possible ways of playing go and say like yeah the best one please i think that's easier than like writing out all the if statements of like what moves you actually end up playing i mean i'm not really an expert on this but but it seems like there's a sort of trade-off where for the for the creature running in real time it takes it a lot longer to go through all of the possibilities and then to choose one of them so if you could have told it ahead of time what was good to do you might be better off doing that like in in terms of you know maybe you you put a lot of bits of selection in initially into like what the what the thing is is gonna do and then it just carries it out fast and i would sort of expect it to just depend on the environment that it's in like where you want to go on this yeah although i mean in that case it's like um you're maybe right that they're at least like even if i was right here there would be like a lot of environments probably where the best thing is the the agent-like one or maybe many more where it's like it's some sort of intermediate thing where they're like some things that get hard-coded because they're basically always the same like you probably don't want to build something that just like as soon as you turn it on and just like tries to figure out from first principles like what the world is or something like if you just already know some stuff about the world it's like probably better just tell it yeah and you definitely don't want to build a thing that does that like every day when it wakes up you know but you might think then that within the things that might vary it's like good for it to be agentic and therefore it will be yeah or also that it's just particularly i kind of think that um when you're looking over possible you know specific neural networks so have it's just like easier to find the relatively agent-like ones because they are somehow simpler or like they're they're more of them roughly and then once you get an agent-like thing you can like tweak it a bit or more of them for a given performance level maybe it's a bit hard to be exact about this without an actual definition of what an agent is yeah it seems right so you were just saying that um uh maybe agents it is much easier to find in program space than other things yeah i kind of think that that seems possible but i haven't given it much thought it seems like a good one to think more about for me so and then i think the next uh step in the argument was that the agent's goals are typically like quite bad or did no did we already talk about that no we were just talking about it has goals and then they're quite bad which is like i was organizing it as uh its goals are like not human goals and then by default non-human goals are pretty bad yeah i guess i i think it's sort of unclear like human goals are are not like a point they're sort of some logic cloud of different goals that that humans have it's not very clear to me like how big that is compared to the space of goals it seems like in some sense there are lots of things that particular humans like having for themselves it seems like if you if you ask them about utopia and what that should be like it might be quite different at least to me it's like pretty unclear whether it's like for a random person if they got utopia for them whether that would clearly be basically amazing for me or whether it has a good chance of being not good i guess there's a question of like ai if you tried to make it have human goals how close how close does it get to like is it basically within that crowd of of human like that cluster of human things but not perfectly the human you are trying to get or is it like far away such that it's much worse than anything we've ever dealt with where if it's like it's like not perfectly your goals if we were trying to have it be your goals but it's like way closer than my goals are to your goals then you might think that this is like a step in a positive direction as far as you're concerned or maybe not but like it's at least you know not much worse than having other humans except to the extent that like maybe maybe it's much more powerful or something and and isn't exactly anyone's goals so so i think the response yeah i think the idea here is like twofold like firstly there's just some difficulty in like specifying like like what do i mean by human goals if i'm trying to like load that into an ai like what kind of program would like generate those okay there are three things one of those one of the things is that the second thing is some concern that like look if you're just searching over a bunch of programs to see which which program does well on some metric and you think the program you're getting is doing some kind of optimization like there are probably a few different metrics that uh that fit well with like succeed well like if you have objective a you could probably have like objectives b c d and e that also motivate you to do good stuff in the like couple of environments that you're training your ai on but like perhaps bc dne might imply very different things about the rest of the future for instance b could be like you know what one example of a goal b would be play nice until you see that it's definitely the year 2023 in which case go go wild i think the claim has to be like firstly that it's like that there is such a range sorry such a range a range of goals such that if an optimizer had that goal it would still look good when you're testing it which is basically how we're picking ai algorithms and you have to think that range is like yeah it's big in that it's like much larger than the range you'd want than like an acceptable range of like you know within like human error tolerance or something yeah i think like their listeners can refer back to episode four with evan hubinger for some discussion of this but i'm wondering like yeah what do you think about that line of argument yeah i guess so it sounds like you're saying like all right there are sort of two ways this could go wrong where one of them is like yeah we try to try to get the ai to have human goals and we do it sort of imperfectly and so then it's bad which is more like what people are concerned about further in the past and then more recently the sort of like evans line of argument that's like uh well whatever whatever goals you're even trying to give it it won't end up with those goals cause like you can't really check which goals it has it will potentially end up like deceiving you and having some different goals i see those as more unified than they're often presented as actually yeah everyone okay now you're interviewing me and i get to go on around i feel like everyone keeps on talking about like oh everybody's like totally changed their mind about how ai poses a risk or something and i think some people have or i don't know i've changed my mind about some things but like but like like both of the things there seem to me that they fit into like the difficulty of writing a program that like gets an agent to have the motivations that you want it to have yeah that seems right there's this problem of specification part of the problem of specification is you can't write it down and part of the problem of specification is like well like there are there are a few blocks in learning it right like uh firstly it's hard to come up with the with the formalism that works like on distribution and then it's hard to like come up with like a formalism that also works off like that also works like in worlds that you didn't test on to me they seem more unified than they're like typically presented as but i don't know i when lots of people agree with me often i'm the one who's wrong it seems like they're they are unif or to me it seems like they're unified in the sense that like there's a basic problem that we don't know how to to get values that we like into an ai and they're maybe different in that it's like well we considered some different ways of doing it and each one didn't seem like it would work one of them was like somehow write it down or something i guess maybe that one is also like trying to get get it to learn it but without sort of paying attention to this other problem that would arise yeah i don't know they seem like sort of slightly different versions of a similar issue i would say in which case maybe i'm mostly debating the the non-mezza optimizer one and then maybe the mezza optimizer one being evans one that being the one about uh optimizer is found by search yeah like originally you might think all right if you find an optimizer by search that like looks very close to what you want but maybe it's like not exactly what you want then like i don't know is it that bad it's like if if you make a thing that can make faces do they just look like horrifying things that aren't real human faces once you set them out making faces like no they're basically like human faces except occasionally i guess but maybe that's a kind of argument but i mean sometimes they have bizarre hats but that doesn't seem like too bad i think sometimes like they're kind of melting in green or something but yeah i assume that gets better with time but that's like oh like you're missing the mark a little bit and how bad is it to miss the mark a little bit where like i guess that there's some line of argument about like value is fragile and if you miss the mark a little bit it'll be catastrophic whereas the the meso optimizer thing is more like no you'll miss the mark a lot because in some axis you're missing it a little in that it did look like the thing you wanted but it turns out there are things that are just extremely different to what you wanted that do look like what you wanted by your selection process or something and i guess yeah that that seems like a real problem to me um i guess given that we see that problem i don't have strong views on how hard it is to avoid that and find a selection process that is more likely to land you something closer to the mark yeah i should say we haven't seen the like most worrying versions of this problem like that they're not yet there have been cases where we trained something to play atari and it's sort of like hacks the atari simulator but there have not really been cases where we trained something to play atari and then it looks like it's playing atari and then like a month later it is like trying to play chess instead or something like that you know and yeah so it's not quite the case that oh this i'm just talking about a sensible problem that everyone that we can tell exists if it's like intrinsically a hard to know it exists problem though if the problem is like your your machine would like pretend it was a different thing to what it really is like it's sort of naturally confusing i guess we've covered we've now gone to both the like questions of like ai having different goals to humans and like slight differences being extremely bad yeah then perhaps there's the the other prong here of the ai being terrible which is that if it wanted a future that we hate it would destroy everything or somehow get a feature that we hate where maybe i have more to say on the slow getting a future that we hate where i guess you know a counter-argument to the first one is like there's it's just like not obvious whether you should expect it to just rapidly be extremely amazing but you know perhaps there's some reason like there's some chance for that but as far as counter-arguments to the other thing goes um it seems like in the abstract it kind of rests on a simple model that's like more competent actors tend to accrue resources either like via economic activity or stealing and that seems true but it's like all things equal and there are lots of other things going on in the world and i think just like a very common error to make in like predicting things is like you you see that there is a model that is true and you sort of don't notice that there are lots of other things going on i think in the in the world at the moment it doesn't seem the case that like the smartest people are just far and away winning at things i mean like they're doing somewhat better often but it's like very random and there are lots of other things going on huh it does basically seem to me that people who are the most competent not all of them are succeeding amazingly but it seems like they have a way higher rate of amazing success than everyone else and i don't know whenever i like i hear stories about like very successful people and they seem much more confident than me or anybody i know and some people are really amazingly seem to me to be really amazingly successful i don't know if that matches my experience but also i guess it could be true and also just like a lot of like very competent people also aren't doing that well where it's like yeah it does increase your chance but it's like such that the people who are like winning the most you know everything has gone right for them like they're they're competent and also they like have good social connections and things randomly went well for them um where where still each of these is like not just going to make you immediately win yeah but but i mean if you think that there's like some sort of relationship here and that we might get ai that's like much more competent than everyone else then shouldn't shouldn't we just like follow that line and be like oh well i might just get all the stuff i mean it seems like maybe that's what the model should be but there are sort of other relationships that it might be like like you might imagine that you just have to have a certain amount of like trust and support from other people in order to like be allowed to be in certain positions or get certain kinds of social power and that it's quite hard to be like uh an entity who like no one is willing to extend that sort of thing to who's like quite smart and still to get very far yeah i mean the hope is that you can trick people right that's true i too like for for smart people trying to trick people in our world i think they often do quite badly um or i mean sometimes they do well for a while but like yeah i guess we are we do try to defend against that sort of thing and that makes it like the the it's not just like the other things are not helping you do well but people's lack of trust of you is causing them to actively you know try to stop you yeah i mean maybe this gets to the human race thing to me it seems like the question i want to that analogy that seems closer to me is not like the most competent humans compared to the least commented humans but the most commented humans compared to like dogs or something it's not actually very hard to trick a dog like i'm not that good at tricking people and i've tricked dogs you know yeah i mean dogs are unusually trusting animals but like i don't know i think i think other animals can also be tricked that's fair um i guess like when we were talking about the human range thing it also seemed like reasonable to describe some animals cognitive abilities of some sorts it's like basically zero or something on some scales so i somewhat wonder whether it's like yeah it's like you can trick all manner of animals that like basically can't communicate and don't have much going on in terms of concepts and so on and like you can trick humans also but it's like not just going to be trivially easy to do so i mean i think at some level of intelligence or something or some level of capability and having resources and stuff maybe it is but also if you're hoping to like trick all humans or something or like not get caught for this i guess clearly humans can trick humans pretty often but but if you want it to be like a long run sustainable strategy yeah you need a combination of tricking some and outrunning the others right something like that i agree that it might it might be that with intelligence alone you can quite thoroughly win but i just think it's like less clear than it's often thought to be or something like that the model here should have a lot more parts in it and so candidate parts are like something like the relationship between people trying to not have all their stuff be stolen and your agent that's trying to steal everyone's stuff or ms all the resources or something i think i'm not quite sure what you meant by that one oh or you know like agents that are trying to trick everyone to accrue resources um versus like people who are don't like to be tricked sort of yeah so i guess maybe i think of that as like there are a bunch of uh either implicit or explicit agreements and norms and stuff that that humans have for dealing with one another and if you want to do a thing that's like very opposed to you know existing property rights or expectations or something that's often a lot harder than doing things that are like legal or well looked upon by other people i think like within the range of things that are not immediately going to ring alarm bells for people it's it's probably still like easier to get what you want if you're more trusted or have better relationships with different people or like in roles where you're allowed to do certain things i'm not sure how relevant or it seems like ai might just be in different roles that we haven't seen so far or like treated in different ways by humans but i think like for me if like even if i was super smart and i wanted to go and do some stuff it would sort of matter a lot like how other people were going to treat me or like on what things they were just going to let me do them and what things they were going to like shoot me or like just pay very close attention or ask me to do heaps of paperwork or yeah i think the argument has to be something like look there's some range of things that people will let you do and like if you're really smart you can find really good options within that range of things and like the reason that that's true is that like like somehow when human norms are constructing the range of things that we let people do it's like we didn't think of all the terrible things that could possibly be done and we aren't just like only allowing the things that we are certain are going to be fine yeah that seems right i guess it seems like partly we we do sort of respond dynamically to things like if someone finds a loophole i i think we're often not just like oh well i guess you're right that was within the rules like often we like change the rules on the spot or something uh it's my impression but you know you might be smart enough to to get around that also to come up with a plan that doesn't involve anyone ever managing to stop you yeah you you somehow need to go like very quickly from like things looking like you haven't taken over the world so things looking like you have or at least like very invisibly in between or something and and like maybe that's possible but it seems like a sort of quantitative question like how smart you have to be to do that and how long it actually takes for machines to become that smart and i guess different kind of counter argument here or at least a different like complication um i think the similar kind of model of like you know if you're if you're more competent you get more of the resources and then you take over the world it was sort of treating share of resources as like how much of the future you get which is not quite right so yeah the question is like if you have like x percent of the stuff do you get x percent of the future or something it seems like uh a way that's obviously wrong is like suppose suppose you're just like the only person in the universe and um and so it's sort of all yours in some sense and you're just like sitting on earth and this asteroid coming toward you you clearly just like don't have much control of the future um so there's also like i don't know you could you could then maybe model it as like oh yeah there's a bunch of control of the future that's just going to like no one and maybe you can kind of you know get more or something like that and it seems like it would be nice to be clear about this model it seems like also in the usual sort of basic economics model or like the argument for say having more immigrants and that not being bad for you or something is is sort of like yeah well you'll trade with them and then you'll get more of what you wanted and so you might want to make a similar argument about like ai uh like if it wasn't going to steal stuff from you if you were just going to if it was just going to trade with you and over time it would get more wealth but you would get more wealth as well it would be like a positive some thing and you'd be like well how could that possibly be bad like let's say it never violates your property rights or anything it's more like well there was a whole bunch of the future that was going to no one because you were just sitting there probably gonna get killed sometime and then you both managed to to get more control of the future wasn't that good for you and then it seems like i guess the the argument would be something like ah but you were hoping to take the whole future at some point like you were hoping that that you could build some different technology that wasn't going to also like have some of the future but yeah and maybe that works out but i guess this just seems like a more complicated argument that it seems good to be clear on yeah i mean i mean to me the responses yes so there's two things there firstly there's a question of like i guess they're kind of both about widening the pie to use the like uh classic economist metaphor here so so one thing i think is that like if you have um like suppose i have all the world's resources which are like you know a couple loaves of bread and and that's all the world has sadly like then i'm not gonna do very well against asteroid right whereas like right now even though i don't have all the world's resources like i'm gonna do i'm gonna do better because there's sort of more stuff around or even i personally have more stuff but but so in the ai case yeah i guess the thing is like if the ai has like or if ai technology in general has like way more resources than me and is not like cooperative then it seems the case that that's clearly going to be world worse than a world in which like there's a similar amount of resources going around but i and my friends have all of it instead of them being under the control of an ai system yeah and that's the second thing i don't know my take is that yeah the thing we should worry about is ai theft i don't know i i don't have a strong reason to believe that these like terrible ai systems are going to like respect property rights i don't know some people have these stories where they do and it's bad anyway but it seems to me that like just steal stuff you know i guess you might think about the case where they respect property rights either because like you are building them so sometimes you'll succeed at like you know some basic aspect of getting them to act as you would want where respecting respecting the law is like a basic one that that being our usual way of dealing with creatures who are you know otherwise not value aligned i mean that's not how we deal with like dogs right or like it a little bit is but they don't you know they're treated very differently by the law than most things and like if i think about like what what's our control mechanism for like dangerous animals or something it's it mostly isn't law i i guess maybe the problem is like well they just can't understand law but it seems to me that like obey the law i don't know in my mind once you've solved like get a thing that robustly obeys the law you have solved like most to all of the problem that seems plausible um it's not super obvious to me but or like and in particular that seems like a hard thing to me i agree it seems plausibly hard i guess the other reason it seems maybe worth thinking about is just that like if you did get to that situation it still seems like you might be doomed like i don't know that i would be happy with the world where we trade with the ais and they are great and we trade with them a lot and they get a lot of resources and they're really competent at everything and then they go on take take over the universe and we don't do much of it maybe i would prefer to have just like waited and um hope to build some different technology in the future or something like that yeah i i guess there's this question of like is it a thing that's basically like human extinction and you know even out of the ai outcomes that are not basically like human extinction perhaps some can be better than others and perhaps we should spend some time thinking about that i guess that's not what i was saying but yeah that seems uh seems potentially good oh to me that seems almost the same as what you were saying because in the outcome where we have these like very smart ais that are trading with us and where we end up like slightly richer than we currently are to me that doesn't seem very much like human extinction ah i was thinking that it uh does depending on how we or if it's like well we we end up basically on one planet and we're a bit richer than we currently are and we lost until the sun kills us i mean okay maybe maybe i wouldn't describe it as human extinction in that we didn't go extinct however as far as like how much of the possible value of the future did we get supposing that the i ai is doing things that we do not value at all it seems like we've basically lost all of it yeah that seems right i think this is perhaps controversial in the wider world but not controversial between you and i i'm definitely in favor of trying to get ai systems that really do stuff that we think is good and don't just fail to steal from us nice um yeah i don't know i think maybe part of my equanimity here is thinking like if you can get obey the law then maybe it's just not too much harder to get like be like really super great for us rather than just merely like lawful evil i probably have to think more about that to uh to come to a very clear view i guess i'm more uncertain this is like i guess we were talking earlier about the possibility that um humans are mostly doing well because of their like vast network of socially capable culture accumulators i think in that world it's sort of less clear what to think of like super ais joining that network i guess the seems like the basic units then are more like big clusters of creatures who are talking to each other it's sort of not clear what role super ai plays in it like why doesn't it just join our one is that there's a separate ai one that that's then like fighting ours i don't know why would you think that i it's just kind of like the basic a basic intuition i think people have is like uh well so far there have been like the humans they're sort of the basic unit and then we're going to have this smarter thing and so anything could happen but if it's more like no so far the basic unit has been like this giant pool of humans interacting with each other in a useful way and there have been like more and fewer of them and they have different things going on and now we're going to add a different kind of node that can also be in a network like this i guess it's just like the other argument i feel like it doesn't or the other intuition um doesn't go through for me at least i think i'm going to move on now yeah so i guess i'd like to talk a bit more about unai impact's research yeah so sort of circling back so how what do you think hopefully uh impacts i don't know if you hope this i hope that a impact will continue for at least 10 more years i'm wondering what do you think the most important question is freya impact to answer in those 10 years a bit more broadly what would you like to see happen i guess on the most important question i feel like there are questions that we know what they are now like when will this happen how fast will it happen um and it seems like for like if we could become more certain about timelines say such that we were like ah it's like twice as likely to happen in the next little while than we thought it was and it's probably not gonna happen in the longer term as far as like guiding other people's actions i feel like it's maybe not that helpful in that you're you're just kind of like you're sort of giving one bit of information about like what should people be doing it seems like they're i think that's kind of true for for various of these high-level questions except maybe like is this a risk at all where if you managed to figure out that it wasn't a risk then everyone could just go and do something else instead which sounds pretty big but but i feel like there are things where it's less clear what the question is where it's more like maybe there is details of the situation where you could just see better like like how exactly things will go down um where it's currently quite vague or like currently people are talking about like fairly different kinds of scenarios happening like you know are the ais all stealing stuff is there just like one super ai that immediately stole everything is it some kind of long-term like ai corporations competing one another into oblivion or something it seems like just getting the details of the scenarios down would be pretty good yeah i guess this isn't my like long-term well-considered answer it's more like right now which thing seems good to me and i guess another question what work do you think is the best compliments to ai impacts research so what so what thing that is like really not what air impacts does is the most useful for what a impact does it's hard to answer the most useful um but a thing that comes to mind uh is sort of similar questions or like also looking around for empirical ways to like try to answer these high level questions but with more of a you know ai expertise um background i think that's clearly a thing that uh that we're lacking all right sort of a meta question is there anything else i should have asked about uh ai impacts and your research and how you think about things you'd ask how it differs from like academia like what is this kind of research yeah i guess being in academia that was very obvious to me i'm curious how it seems to differ to you then oh uh well to me it seems like academics so probably there's the the deliverable unit right like like there are some pages on this website or one one page in particular that i'm just remembering that was about the like time to solve various math conjectures and the title is like how long does it take to solve mathematical conjectures and the body is like we used this estimator on this data set here's a graph the answer is this yeah and that's the page kind of and like it seems to me that academics tend to like producing longer pieces of work that are sort of better integrated that are fancier and that are like fancier in some sense yeah that seems right in machine learning people love making sure that you know that they know a lot about math you know i'm curious what you mean by better integrated even like into the larger literature yeah or or like it seems to me that a impact has a bunch of pages and they're sort of like they're kind of related to each other but they're like like an academic is gonna write a bunch of papers that are like relatively i think more closely related to each other whereas or like the the kind of clusters of related stuff seems like there's going to be like more per cluster than maybe ai impacts does i think ai impact seems to be to me to be happier jumping around to write yeah just just right like just answer one question and then move on to something totally different yeah i think that's right it does occur to me that maybe normally in these shows i don't normally tell people what their research is that's helpful though because i mean i think it's true that air impacts is like different in those ways um and that that is intentional but i haven't thought about it that much uh in you know several years so it's good to be reminded like what other things are like i guess there's also the people who do it you know like uh academics tend to like research being done or supervised by people who have done a phd in a relevant field and like spend like a lot of time on each thing whereas a impact has more i don't know more i get i guess maybe like more generalist this way to summarize a lot of these things like more of it seems to be done by like philosophy phd students taking a break from their programs which sound terrible to me that seems right yeah i think part of the the premise of the project was something like there is a bunch that you can figure out from kind of like a back of the envelope calculation or something often that is like not being made use of um and so often what you want to add value to a discussion is is not like an entire lifetime of research on one detailed question or something and so it's sort of like okay having done a back of the envelope calculation for this thing the correct thing to do is like now move on to another question and if you ever become like so curious about that first question or again or like or like you really need to be sure about it or to get like a finer grain number or something then like go back and do another level of checking yeah i think that was part of it i think i don't quite understand why it is that writing papers take so long compared to like writing blog posts or something uh it seems like in my experience it does like even when there's a similar level of quality in some sense so i i was partly hoping to just do things where at least like the norm is something like okay you're doing the valuable bit of like some sort of looking things up for calculation then you're writing that down and then you're not like spending you know six months whatever else it is i do when i spend six months straight writing a paper nicely yeah yeah there is it does seem easier to me i think maybe i like care a lot more about the let's see when i write a paper i don't know how it is in your subfield but in my subfield uh in machine learning papers sort of have to be eight pages oh wow okay they can't be eight pages in one line so there's some amount of time being spent on that yeah when i write blog posts they're just worse than the papers i write and i spend much like i just spend 30 minutes writing the first thing i thought of and i don't know this is especially sailing to me now because a few days ago i published one that got a lot of pushback that seems to have been justified yeah there does seem to be something strange i think another big difference is that we're often trying to answer a particular question and then we're using just like whatever method we can come up with which is often like a bad method because there's just like no good way of answering the question really well whereas i think academia is like more methods oriented like they they know a good method and then they like find questions that they can answer with that method uh at least is my impression this is not how it's supposed to work i know my advisor is very yeah so stuart russell my advisor has the take that like look the point of a phd and i think maybe he would also say the point of like you know a research career is to like answer questions about artificial intelligence and do the best job you can and it's true that like people care more i think people do care more about good answers to bad questions than bad answers to good questions but there is at least i don't know stewart's also a little bit unusual in his views about how phd thesis theses should be i'm probably also being unfair it might be that like in academia it's like you have to post you're at least trying to ask a good question and answer it well whereas we're just like trying to answer a good question yeah i i think there's also this thing of like who's checking what you know like in academia people are most of the checking is like did you answer the question well like like when your paper goes through peer review nobody says like the world would only be 10 richer if you answered this question they sometimes talk about interest which is like kind of a different thing so those are some differences between ai impacts and academia do more come to mind i guess maybe it may be a one that um is related to your saying it's not well integrated i guess i think of academia as being like not intended to be that integrated in that like it's sort of a bunch of separate papers that are kind of referring to each other maybe but not in any particular like there's not really like a high level organizing principle or something whereas as with ai impacts it somewhat grew out of a project that was like to make really complicated work flowies of different arguments for things what's a workflowy oh uh it's like um software for writing bulleted lists that can sort of go arbitrarily deep in there like in indenting like the levels of indentation right and you can sort of zoom in to like further in levels of indentation and just look at that so you can just have like a gigantic list that is like everything in your life and so on but but i guess um paul cristiano and i were making is what we called structured cases that were supposed to be arguments for for instance like ai is risky or something or like ai is what is is the biggest problem or something and the idea was like there would be this top level thing and then there'll be some nodes under it and it would be like yeah if you agree with the node's under it then you should agree with the thing above it and so you could look at this and be like oh i just drew the thing at the top which of the things underneath the way i disagree with that's that one all right what do i disagree with under that and so on this this is quite hard to do in work flowy somehow and it was like very annoying to look at and so yeah i guess one idea was to to do a similar thing where each node is like a whole page where there's like some kind of statement at the top and then the support for it is like a whole bunch of writing on the page instead of that just being a huge jumble of logically carefully related uh statements and so i guess air impacts is is still somewhat like that though not super successfully at the top in that it's like missing a lot of the top level nodes that would make sense of the lower level nodes is that like roughly a comprehensive description of the differences between a impact and academia and practice is much smaller than academia also true i imagine it's not a comprehensive list but it'll you know do for now all right so i guess the final question that i'd like to ask oh i forgot about one another one is you by working on ai impacts you you sort of look into a bunch of questions that a lot of people i guess have opinions on already what do you think people in like aix risk research are most most often most importantly wrong about i guess i'm pretty unsure and like places where it seems like people are wrong i definitely don't have the the impression that like obviously they're wrong rather than like i'm confused about what they think i think we have a sort of meta disagreement maybe where i think it's better to write down clearly what you think about things than many people do which is you know part of why i don't know exactly what they think about some things yeah i mean it's not that other people don't ever write things down clearly but it uh i guess it's a priority for me to try and like write down these arguments and stuff and you know really try and pin down whether they're sloppy in some way whereas it seems like in general there has there's been like less of that than i would have expected over all i think i mean i think people write a lot of stuff yeah it's just about like it's sort of taking a bunch of things for granted or like maybe the things that were originally laying out the things that are being taken for granted were not like super careful you know more like evocative which is maybe like the right thing for evoking things um maybe that was always needed but um yeah i guess i'm like more keen to see things later clearly all right so if people are interested in following you and your research and ai impacts research how how can they do so well the ai impacts.org website is a particularly good way to to follow it i guess uh we have a blog which you can see on the website so you could you could subscribe to that if you want to follow it i guess otherwise we just sort of put up new pages sometimes some of which are interesting or not which you can also subscribe to see if you want if you just want to hear more things like that i think about stuff i have a blog worldspur sockpuppet.com all right great thanks for uh appearing on the show and to the listeners i hope you join us again thank you for having me this episode is edited by finn adamson the financial costs of making this episode are covered by a grant from the long-term future fund to read a transcript of this episode or to learn how to support the podcast you can visit accerp.net that's axrp.net finally if you have any feedback about this podcast you can email me at feedback axrp.net
Related conversations
AXRP
7 Aug 2025
Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -5 · 133 segs
AXRP
1 Dec 2024
Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -7 · 120 segs
AXRP
11 Apr 2024
AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -9 · 174 segs
Future of Life Institute Podcast
7 Jan 2026
How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -3 · 85 segs
Counterbalance on this topic
Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.
Mirror pick 1
TED Talks
18 Dec 2023