Library / In focus
AXRPCivilisational risk and strategy
Finite Factored Sets with Scott Garrabrant

Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from AXRP. Editorial summary pending review.
Perspective map
MixedGovernanceMedium confidenceTranscript-informed
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
StartEnd
Across 89 full-transcript segments: median 0 · mean -0 · spread -13–0 (p10–p90 0–0) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.
Slice bands
89 slices · p10–p90 0–0
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- Emphasizes safety
- Emphasizes ai safety
- Full transcript scored in 89 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation.
ai-safetyaxrp
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video Oc47lJXP6uQ · stored Apr 2, 2026 · 3,165 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/finite-factored-sets-with-scott-garrabrant.json when you have a listen-based summary.
Show full transcript
before we begin a note about this episode more than other episodes it assumes knowledge of the subject matter during the conversation so although we'll repeat some basic definitions before listening there's a good chance you'll want to watch or read something explaining the mathematics of finite factored sets the description of this episode will contain links to resources that i think do this well now onto the interview hello everybody today i'll be speaking with scott garabrant scott is a researcher the machine intelligence research institute or mary for short and prior to that he earned his phd in mathematics at ucla studying combinatorics today we'll be talking about his work on finite factored sets for links to items that we'll be discussing you can check the description of this episode and you can read a transcript at axrp.net scott welcome to axerb i guess we're going to start off talking about the finite factored sets work but to start off that starting off you've kind of compared this to or i think it's sort of meant to be somehow in the same vein of gi apparel's work on causality where you have this like directed acyclic graph of you know nodes and arrows and the nodes are like things that might happen and the arrows are like one thing causing another kind of so i'm wondering like what's good about purlin causality like why does it deserve to be developed on let's just start with that what's good about proline causality what's good about proline causality so uh specifically i wanna i want to draw attention to the fact that i'm talking about kind of earlier purlin stuff like role is a bunch of stuff i'm talking about the stuff that you'll find in chapter two of the book causality specifically and i mean so basically it's a framework that allows you to take statistical data and from it infer temporal structure on the variables that you have which is just really useful for a lot of concepts there are a lot of a lot of purposes it's a framework that allows you to kind of just go from pure probabilities to having an actual structure as to what's going on and causality which then lets you answer some questions about interventions possibly and things like that so if it's so great why why do we need to do any more work why can't we just all read his book and go home so my main issue is kind of a failure to work well with abstraction and so we have these we have these situations possibly like coming from decision theory where we want to model agents that are making some choice and they like have some effect on the world and it makes sense to model these kind of things with causality which is not like directly using like prolene causal inference but like using using just like kind of the general framework of causality where we kind of draw these dags that draw these directed acyclic graphs with arrows that kind of represent effects that are happening and you run into these problems where if you have an agent and for example it's being simulated by another agent then there's this desire to put multiple different copies of the same structure in multiple different places in your causal story and to me it feels like this this is really pointing towards needing the ability to have some of your nodes some of your variables in your causal diagrams be abstract copies of other nodes and variables and there's an issue like the prolian paradigm doesn't really work well with being able to have some of your variables be abstract copies of others so so why what's an example of a place where you'd want to have like multiple copies of the same structure in different places if you could spell that out yeah i can i can give like more uh specific direct examples that kind of are made up examples but really i want to claim that you you can see a little bit of this anytime you have any agent that's making a decision agents will make decisions based on their model of the world and then they'll make that decision or based on the model of the consequences of their actions and they'll make a decision and they'll take an action and then those consequences of their action will will actually take place in the real world and so you can kind of see that there's there's like the agent's model of what will happen that is kind of causing the agent's choice which kind of causes what actually happens and there's this weird relationship between the agent's model of what will happen and what actually happens that could be well described as the agent's model as kind of an abstract version of the actual future okay and why why can't we have abstractions in perlin causality yeah so the problem i think lies with what happens when you have some variables that are kind of deterministic functions of others and so if you have a abstract refined version or if you have a like refined version of a variable and then you have another course abstract version of the same variable you can kind of view the course version that has less detail as a deterministic function of the more refined variable and then pearl has some stuff that like allows for determinism in the structure but the part that i'm like really that i really like doesn't really have a space for having some of your variables be deterministically or partially deterministically related okay and in the parts of pearl where some of your variables can be deterministically related the ability to do inference is is much worse the ability to infer causality from the statistical data yeah i guess like how to what degree are we dealing with strict determinism here because like um you know guesses can sometimes be wrong right if i'm thinking about the necessity of abstraction here as like i have models about things like it's not really the case that uh my model is like a deterministic function of reality right yeah this is this is right i mean there's a story where you kind of can have some like real deterministic functions where you have where you have like multiple copies of the same algorithm in different places in space time or something but yeah i feel like i'm kind of dancing around like my true crux with the prolian paradigm and i don't i don't know i'm not very good at like actually pointing at this thing but my true crux feels something like variable non-realism where in the purlin world we have like all of these different variables and you have a structure that kind of looks like this variable listens to this variable and then this variable listens to this other variable to kind of determine what happens and in my world i'm kind of saying that i'm kind of in an ontology in which there's nothing real about the variables beyond their information content and so if you had two variables one of which is a copy of the other it wouldn't really make sense to talk about like yeah if x and x prime were copies of each other it wouldn't really make sense to ask is why looking at x or looking at x prime if they're actually the same information content and so kind of philosophically i think that like the biggest difference between my framework and pearls framework is something about denying the realism of the variables uh yeah that didn't really answer your question about do we really get determinism um i mean i think that systems that have a lot of determinism are useful for models like we all like we have we have systems that don't have real determinism but we also don't actually analyze our systems in full detail and so i can have a high level model of the situation and while like this calculator is not actually deterministically related to this other calculator like relative to my high level model it kind of is and so even if we like don't get real determinism in the real world in high-level models it feels still useful to be able to work okay with determinism but i don't know i think that focusing i don't know in some sense i want to say determinism is kind of the real crux but in other sense i want to say that it's distracting from the real crux i'm not really sure it also seems like one issue well let's say i'm playing go and i'm thinking about what's going to happen when i make some move and then i play the move and then that thing happens well if we say like that's my model of what's happening is an abstraction of what actually happens it's like a function of what actually happens then it's the case that there's sort of an arrow from what actually happens to what i think is going to happen and then there's an arrow from that to what i do uh because you know i that's what causes me to make the decision and then like uh there's an area from like what i do to what actually happens because that's that's how normal things work but then you have a loop which uh you're sort of not allowed to have in perl's framework so that seems like kind of a problem yeah i think that largely my my general or a large part of my research motivation for the last i'm not sure how long i think at least three years uh has been a lot towards trying to fix this problem where you have a loop like thinking about decision theory and ways that like people were talking about decision theory in like i don't know around 2016 or something there's like stuff that involved like well what happens when you take dags but you have loops and stuff like this and i had this like glimmer of hope around well maybe we can not have loops when i realized that in a lot of stories like this the loop is kind of caused by wanting to have like by conflating a variable with an abstract copy of the same variable um and that's not what you did what you did was you drew an arrow from the thing that actually happens to the to your model and i think that so in my framework there aren't going to be arrows but to the extent that there are arrows like this it makes more sense to draw the arrow from the course model to what actually happens and to the extent that the course model is like a noisy approximation what actually happens you kind of won't actually get an arrow there or something but in my world kind of the coarser descriptions of what's going on will kind of necessarily be no later than a more refined picture of the same thing uh and so i think that i think that i more want to say you don't want to draw that same kind of arrow between the real world and the model and kind of if i really wanted to like do all this with graphs i would say you should at least draw like an undirected arrow that's kind of representing their logical entanglement that's not really causal or something like that that's not the approach that i take the approach that i take is to kind of throw the all the graphs away but yeah so i largely like gained some hope that these weird situations that felt like they were happening all over the place in setting up decision theory problems and agents that trust themselves and think about themselves and do all sorts of reasoning about themselves i gained some hope that all of the weird loopy stuff that's going on there might be able to be made less loopy via somehow combining temporal reasoning with abstraction and that's largely a good description of a lot of what i've been working on for many years all right i guess if i think concretely about this like course versus abstract thing imagine i'm like so basically think about the new comb scenario so newcomb's problem is like there's this really smart agent called omega and omega is simulating me daniel feilin and omega gives me the choice to just like take one box that has an unknown amount of money to me in it unknown to me um and two boxes where i take the unknown amount of money and also like the thousand a box that contains like a thousand dollars and i can just say see if it definitely has a thousand dollars and omega omega is like uh a weird type of agent that says you know i figured out what you would do and if you would have um taken one box i put a lot of money into one box but if you would have taken both then i put almost none in it and then i take one or the other so what should i think of it seems like in your story there's an abstract variable uh maybe in a non-realistic kind of way because we're variable non-realists but there's some kind of abstract variable that's like omega's prediction of what i do and then there's like what i actually do and like if it's the case that omega just correctly guesses but like always correctly predicts um whether i'm going to one box or two bucks what should i think of like like in what way is this abstraction lossy or what like extra information is there when i actually take the one or two so i guess if omega is just like completely predicting you correctly then i don't want to say that like i kind of i kind of want to say well there's a variable-ish thing that is what you do and it goes into omega-fill in the boxes and it also goes into you choosing the boxes and it's kind of at one point in logical time or something like that um i think that the need for actual abstraction can be seen more in a situation where the where you daniel can also partially simulate omega and learn some facts about what omega is going to predict about you so maybe yeah omega is doing some stuff to predict you and you're also simulating omega and in your simulation of omega you can see predictions about stuff that you're currently doing or about to do and stuff like that and then there's this weird thing where now you have these weird loops between your action and your action uh so so in the situation where like omega was kind of opaque to you the intuition goes against what we normally think of as time but we didn't necessarily get loops but in the intuition where you're able to kind of see omega's prediction of you things are kind of necessarily lossy because you could kind of diagonalize against predictions of you um and kind of because it doesn't fit inside you so so let's see if you're looking at a prediction of yourself and you have some like program trace which is your your like computation and you're working with an object that is a prediction of yourself and maybe it's a very good prediction of yourself you're not actually going to be able to fully simulate every little part of the program trace because it's kind of contained inside your program trace and so there's a sense in which i want to say there's some abstract facts that maybe are predictions or proofs about what daniel will do and that can kind of live inside daniel's computation and then daniel's actual program trace is this more refined picture of the same thing and so yeah i think the need for like actually having different levels of abstraction that are at different times is coming more from situations that are kind of actually loopy as opposed to in the nukem problem you described the only reason that it feels loopy is because we have this logical time then we also have physical time they seem to go in different directions so now we've gotten a bit into the motivation what is a finite factored set yeah okay so there's this thing i know i i guess i first want to like recall the definition of a partition of a set so a partition of a set is a set of uh subsets of that set so we'll start with an original set s and the level partition of s is a set of subsets of s such that each of the sets is not empty and the sets are pairwise disjoint so they don't have any common intersection and when you union all the sets together you get your original set s so it's kind of a way to take your set and view it as a disjoint union yeah i kind of think of it as like dividing up a set into parts and that that way of dividing it up is a partition yeah and so i introduced this concept called a factorization which can be thought of as a multiplicative version of a partition where you kind of in the partition story you kind of put the sets next to each other and you kind of union them together to get the whole thing and in a factorization i instead want to kind of multiply your different sets together and so the way i define a factorization of a set s is it's a set of non-trivial partitions of s such that for each way of choosing a single part from each of these partitions there will be a unique element of s that's in the intersection of those parts and so the same way that you can view partition as a disjoint union you can view a factorization as a or sorry a partition is a way to view s as a disjoint union a factorization of s is a way to view s as a product okay and so to make that concrete an example that i like to have in my head is suppose we have like points on a 2d plane and we imagine the points have an x coordinate and a y coordinate so one one partition of the plane is i can divide the plane up into like you know sets where the x coordinate is zero sets where the x coordinate is one so that's where the x coordinate is two and those look like lines that are sort of perpendicular to the x-axis and like none of those lines intersect and you know every point has some x-coordinate so um you know it's it's this set of lines that together cover the plane that's one partitioning the x-partitioning there's another one for values of y right which are look like horizontal lines that are you know various amounts up or down and so once you have like the x partitioning and the white partitioning let like any point on the plane can be kind of uniquely um identified by by which part of x partitioning it is and which part of the y partitioning it is because it just tells you you know how much to the right of the origin are you how much above the origin are you that just like picks out a single point i'm wondering like do you think that's like a good kind of intuition to have yeah i think i think that's a great example to say a little bit more there so your original set s in that example you just gave is going to be the entire cartesian plate instead of all ordered pairs like x coordinate comma y coordinate and then your factorization is going to be a set b which is going to have just two elements and the uh two elements are the partition according to what is the y-coordinate and the partition according to what is the x-coordinate you you can you can kind of view partitions as like questions and so in general if i have a set like the cartesian plane and i want to specify a partition one quick way to do that is i can just ask a question i can say what is the x coordinate and that question kind of corresponds to the partition that breaks things up according to their x coordinate okay and the one slightly misleading thing about that example is that there are an infinite number of points in the x y plane but uh of course we're talking about finite factored sets so s only has a finite number of points yeah so we're talking about finite factored sets so in general i'll want to work with a pair s comma b where s is a set a finite set and b is a factorization of s why choose the letter b for a factorization it's for basis um yeah it's for basis i largely while i'm thinking of the elements of b as partitions of s i'm also kind of just thinking of them as elements just out on their own that are kind of representing the different basis elements yeah so it actually looks a lot like a basis because for any point in s you can kind of uniquely specify it by specifying its kind of value on each of the basis partitions okay so yeah this gets into a question i have which is how should i think about the factors here like these finite factored sets i guess they're supposed to represent like uh what's going on in the world how should i think about factors in general or partitions i can think about as questions these um factors in b how should i think about those yeah i think i didn't explicitly say but the the elements of b we'll call we'll call factors we use the word factor for the elements of b so i almost want to say the factors the factors are kind of a preferred basis for yeah the factors kind of form a preferred basis for your set of possibilities and so like if i consider the set of all bit strings of length five right there are 32 elements there and if i wanted to specify an element i can do so in lots of different ways but there's some something intuitive about this choice of breaking my 32 elements into what's the first bit what's the second bit what's the third bit what's the fourth bit what's the fifth bit so it's kind of it it's it's a set of questions that i can ask about my element that uniquely specify that what element it is and also for any way of answering the questions there's going to be some element that that works with those answers and so factorization is kind of just like it's a combinatorial thing that could be used for many different things but one way to think about it is kind of you're making a choice that is a preferred basis a preferred set of variables to break things up into and you're thinking of those as kind of primitive and then other things is built up from that so properties like do the first two bits match is then thought of as built up from what is the first bit and what is the second bit and so it's kind of a choice of what comes first a choice of what are the what are the primitive variables in your structure okay so if i'm trying to think about maybe some kind of decision problem where you know i'm going to do something and then you're going to do something and then another thing is going to happen and i want to model like that whole situation um with a finite factored set instead of think instead of thinking about modeling like which thing is going to happen to me today you know if i want to model an evolving situation how should i think about what the set s is and what the factors should be yeah so factorization is very general and i actually like i use the word factorization in multiple different contexts and talking about this kind of thing but the question that you're asking to say some more background stuff so so i'm going to kind of introduce this theory of time that has a background structure that looks like a factored set rather than a background structure that looks like a dag looks like a directed acyclic graph like in the prolian case and so specifically if i'm using a factor if i'm using a factored set and i am using that to interpret all right i'm using that to kind of describe some causal situation i am not going to have nodes i'm not going to have factors that kind of correspond to the nodes that you would have in the perlian world instead i am mostly going to be thinking of the factors as independent sources of randomness i'm hesitant there because a lot of my favorite parts of the framework aren't really about probability um but if i'm if i'm thinking about it in a in a like temporal inference setting where i'm like getting the statistical distribution of if i where i'm getting a cisco distribution then i'm thinking of the factors as basically independent sources of randomness uh and so if like we have some variable x and then we have some later variable y that can take on some values and partially is going to be a function of x then we won't have a factor for x and a factor for y we'll have a factor for maybe that which kind of goes into what x is and then we'll have another factor that's like the extra randomness that went into the computation of y once you already knew x and when we think of factored sets as related to probability we're actually going to always want our factors to be independent and so the factors can't really be put on things like x and y when there's a causal relationship between x and y so so it almost sounds like the factors are supposed to be these like um like somehow initial data like the the problem set up or something like the specification of like what's going on but like a like kind of an initial specification yeah and in indeed if you like take my uh theory of time so i'm gonna have like a way of taking a factored set and taking an arbitrary partition on your set s and then i'm gonna be able to like specify time specify when some partitions are before or after other partitions the factors will be partitions with the property that you can't have anything else that's strictly before it besides deterministic things and so there's a sense in which like the factors are initial yeah the factors are like basically kind of the initial things in like the notion of time that i want to create out of factored sense but also intuitively it feels like they're initial it feels like they're like the primitive things that came first and then everything else was kind of built out of them yeah maybe primitive is a better word than initial yeah i mean they're in the uh like post set of divisibility right i kind of want to say that like one is initial not the primes but but like the primes have the property that like you can't really have anything else before them again primitive is like prime is yeah yeah so in the sense that like when you when you're dividing numbers like whole numbers you get to prime so then you get nowhere else and they're primitive in that sense right yeah so you talk about the history of variables as like like all the initial factors that you need to kind of specify what a variable is and then you have this definition of orthogonality and people can read the paper i don't know if they'll be able to read the paper by the time this goes live but they'll be able to read something to learn about what our functionality is yeah the the talk and the transcript of the talk that for example appears on the myriad log or unless wrong it has all the statements of everything that i find important and so like you shouldn't have to like wait for a paper i think that like modular the fact that you'd have to prove things yourself it like has all the important stuff so people i think for me at least it's easy to like read the definition of orthogonality and still not really know like how to think about it right so what should i like yeah how should people think about what's orthogonal so in the like temporal inference thing where we're going to like be connecting up these combinatorial structures with probability distributions orthogonality just it's going to be equivalent to independence and so one way to think of orthogonality is it is i kind of like took out a combinatorial fragment of independence where i'm not like actually working with probabilities but i am kind of working with a thing that kind of is representing independence another way to think of it is like when two partitions are orthogonal you should expect that like if you come in and tweak one of them it will have no effect on the other one they're kind of they're they're they're separated and in the yeah and in like specifically in the factored set framework orthogonality means they do not have a common factor uh and so where like these factors can be thought of as like sources of randomness or sources of something if if they do not share a common factor then they're in some sense separated yeah so i'll just say one example that kind of helps me understand it is like this x y plane example where the factors were like what's your you know you're partitioning up according to the x coordinate and you can also partition up according to what the y coordinate is so you can have one different partition that's like are you on the left-hand side or the right-hand side of the y-axis um that that's a way to part it to partition up the plane it's like a variable it's like are you on the left or are you on the right and there's a second variable that's like are you above the x-axis are you below the x-axis right that that also partitions the plane up and like those two my understanding is that um in this factored set like those two partitions or you could think of them as variables or orthogonal um and that kind of hopefully gives people a sense it's also kind of nice because they're like literally if you think about the dividing lines they're literally orthogonal in that case and that's like maybe not a coincidence yeah so historically when i was like developing factored set stuff i was actually working with things that looked like i have two partitions and there's something nice when it's the case that they're all they're both kind of mutually or sorry for any way of like choosing a value of this x partition and also choosing a value of a y of the y partition those two uh parts will kind of intersect so in your in your example right the point is like all four quadrants kind of exist and this is kind of like there's a sense in which this is saying that like the two partitions aren't really like stepping on each other's toes if we like specify the value of this x partition it doesn't stop us from doing whatever we want in terms of specifying the value of the y partition and largely orthogonality is kind of a step more than that where like a consequence of being orthogonal is that they're not going to step on each other's toes but i have this extra this extra thing which is kind of like it's not it there's there's this extra thing which is really just like the structure of the the factorization but there's this extra thing beyond just not stepping on each other's toes which is coming from some sort of theory of intervention or something like you can view the factored set as kind of a theory of intervention because a factored set basically allows you to take your set and view its elements like tuples and when your elements are like tuples you can imagine kind of going in and changing one value not the others and so orthogonality kind of looks like it's not just x and y are compatible like compatible in all ways of assigning values to each of them but it's also when you mess with one it doesn't really change the other cool so another thing yeah another question about our finality so in the the talk that um listeners can watch or read a transcript of um you sort of say okay we have this definition of orthogonality and this definition of conditional orthogonality which is like a little bit more complicated but kind of similar you then talk about um sort of inference in the real world so like we sort of imagine that you're observing things that are roughly like you don't observe the underlying set but you observe things that are roughly like these factors and somehow you get evidence about like what things are orthogonal to each other and from that you go on and infer a whole bunch of stuff or you can sometimes um and you sort of give an example of like how that might work how would i go about gathering this like orthogonality data of like what things in the world are orthogonal to what other things so the the default is definitely passing through this thing i call a fundamental theorem which is saying basically that orthogonality so the fundamental theorem says that conditional orthogonality is equivalent to for all probability distributions that you can put on your factored set which sort of respect the factorization your variables are conditionally independent and so because of this so so i i don't know i i phrased that as for all probability distributions but you can kind of quickly jump that to like for a probability distribution general position and so the basic thing to do is if you have if you have like access to a distribution over the things over like the elements of your set or over something that's a function of the elements of your set if you if you have if you have access to a distribution it's kind of a reasonable assumption to say that if it looks like if you have like a lot of data and it looks like these two variables are independent then you kind of assume that they're orthogonal in whatever underlying structure produce that distribution and if they're not independent then you assume they're not orthogonal and it's just like the default way to get these things is via is via taking some distribution which could be coming from a bunch of samples or it could be like a bayesian distribution but i think that like orthogonality orthogonality is something that you can basically observe through its connection to independence versus time is not as much i guess this works when my probability distribution has this form where the the original factors in your set b like the primitive variables have to be independent in your probability distribution and if they're independent in that distribution then conditional independence is conditional orthogonality how would i know if my distribution had that nice structure i mean you can just try to that you just interpret all the independence as orthogonality and see whether it kind of like contradicts itself right it's like you might have a distribution that kind of can't really be well described using this thing and one thing you might do is you might kind of develop an orthogonality database where you kind of keep track of all the orthogonalities that you observe and then you notice that there's some orthogonalities that you observed that that are incompatible with kind of coming from something like this yeah i'm not sure i fully understand the question i guess what i'm asking is imagine i'm in a situation right and like i don't already have the whole finite factored set structure of the world like i'm wondering how i go about getting it especially if the world is supposed to be like my life or something so i don't get like tons of free runs maybe this isn't like what it's supposed to do i feel like this is basically the answer you'd give here is similar to the answer you'd give to like the same question about prolian causality and i think that like largely the temporal inference story uh makes the most sense in a context that's very like you have a repeated trial that you can repeat an obnoxious number of times and then you kind of develop you can get a bunch of data and you're trying to like tell a story about this trial that you repeated and yeah so the story that i tell kind of makes the most sense in a situation that's like that i'm excited for a lot of things about factored sets that are not about just like infer doing temporal inference from a probability distribution and those feel like they play a lot more nicely with like like sorry the the applications that feel like they're about embedded agents to me are kind of different from the applications that feel like it's about temporal inference it's like because it feels like you kind of need something like this like frequent test like lots of repetition to be able to get distribution in order to be able to like do a lot of stuff with temporal inference or at least doing it the naive way maybe you can build up more stuff so embedded agency is your term for something like being an agent in a world where like your thinking processes are just like part of the physical world and like can be modified and modeled and such is that a fair summary uh yeah okay so um the applications of the kind of finite factored set framework to embedded agency are you thinking of that more as like a way to model things yeah i'm i'm thinking of it mostly like basically there's a lot of ways in which you want to model like there's a lot of ways in which people model agents using graphs where the edges in the graph represent like information flow or causal flow or or something they'd all kind of feel like they're entangled with like this brilliant causality story and often i think that pictures like this fail to be able to handle abstraction correctly right i have like a node that kind of represents my agent and then i have like i don't really have room for another node that represents like a courser version of my agent because if i did like which one gets the arrow out of it and and such um and so largely my hope in embedded agency is just all the places where we want to draw graphs instead maybe we can draw a factored set and this will allow things to play more nicely with abstraction and playing nicely with abstraction feels like a major bottleneck for embedded agency okay yeah i'll ask more about that a bit later yeah so so i guess i want to ask more questions about like the kind of finite factored set concept itself why is it important that it's finite so i can give an example where you should not expect the fundamental theorem to hold in the cases where it's infinite so the thing where independence exactly corresponds to orthogonality in the infinite case one shouldn't expect that to hold and it might be that you can save it by saying well now we can't take arbitrary partitions we can only take partitions that have a certain shape sort of like measurability criteria sort of like measurability criteria but measurability is actually not going to suffice for this i can give an example yeah i'll just to give an example if you imagine the the infinite factored set that is countable bit strings so this is just like infinite sequences of ones and series yeah and it's a set of all of them yeah see the set of all infinite sequences of ones and zeros and you have kind of the obvious factorization on this set which is you have one factor for each bit and then there's a partition that is asking is the infinite bit string the all zero string and there's the partition that's asking is the infinite bit string the all one string these are two partitions it's called x and y okay and it turns out that for any probability distribution you can put on this factored set that kind of respects the factorization at least one of these partitions is going to have to be like either it's going to be the case that the all zero string is probability zero or the all one string is probability zero yep because because all the bits have to be independent and then you'd be able to conclude that the question is it the all zero string and is the all one string like those those two questions will have to be independent in all distributions on the structure but it really doesn't make sense to call them orthogonal why doesn't it why doesn't it make sense to call them orthogonal because it doesn't make sense to call them orthogonal because like all of our factors kind of go into the computation of that fact and so you kind of like if if you're thinking of orthogonality as like they can kind of be computed using disjoint collections of factors you can't really compute whether something's the all zero string or whether something's all the all one string without like seeing all the factors uh this i mean you can't compute it because the first time you see a one you can say alright i'll stop looking but uh in my in my framework it's that that time doesn't count you have to specify up front of the bits you're going to use and so i don't know i think there's some hope to being able to like save all of this and i haven't done that yet and there's another obstacle to infinity which is even the notion of thinking about the history of a partition that's not even going to be well defined in the infinite case okay and so the history of a partition in the finite case it was just like the small set of factors that like determined like well like if i have a partition with elements like x1 through xk or something like the history of partition is just all of the like basic factors you know all the things in my um set b where if i know like which if you think of the things in the setbacks variables if i know like the value for all of the variables it's like the small set of variables such that if i know the value for all of them i can tell the value i can tell what partition element i'm in for the thing that i'm looking for the history of so it's like the smallest set of like the smallest amount of like initial information or something that's like specifying the thing i'm interested in is that what a history is yeah that's right and it's a bit worrisome because usually like like there's not a smallest set of sets of right like like that's not obviously well defined yeah so smallest is specifically smallest in the subset of grid and so the history of a partition is a set of factors and if you take any set of factors that would be sufficient to determine the value of that partition the answer that question what part it's in if you took any set of factors that were sufficient to compute the value of x then necessarily that set of factors must be a superset of the history and so it's not like smallest by cardinality it's smallest by the subset ordering and showing that this is well defined involves basically just showing that sets of factors that are sufficient to determine the value of x are closed under intersection so if i have two different sets of factors and it's possible to compute the value of x with either of those sets of factors then it's possible to compute the value of x using their intersection but actually this is this is only true for finite intersections and so if i have a partition and you're able to compute the value from either these two sets of factors then you're able to compute the value from from their intersection but if i have an infinite class of of things then i can't necessarily compute compute the value from their intersection uh to see an example of this if you again look at infinite bit strings with the obvious factorization and you look at the partition are there finitely many ones if you take any uh infinite tail of your bit string that's sufficient to determine are there finally many ones but if you take the intersection of all of these infinite tails you get the empty set yeah so one way i'm now thinking of this is like kind of the problem with infinite sets is that like they have these things that are kind of analogous to tail events or something in normal probability theory where you depend on like all of these infinite number of things and sort of the limit of it but the limit is actually zero but you can't exactly yeah there's this actually what i think is a coincidence okay but you could do this naive thing where you say well just take the intersection call that the history and then the history of the question are there finally many ones would then be the empty set and then a lot of properties would break but an interesting an interesting coincidence here or something i don't i don't think it's coincidence but i don't like this definition i don't like the definition of generalized thinks the infinite case by justifying the history to be the intersection but if you were to do that then you would get that the question are there finally many ones is orthogonal to itself because it has empty history and what kind of partitions are orthogonal to themselves they're deterministic ones where one of the parts says probability one and all the others have probability zero and the kolmogorov zero one law says that properties like the question are there finally many ones if all of the individual things are independent are necessarily probability zero or one and so there's this thing where if you define history in this way in this if you naively like extend history to the infinite case by just taking the intersection even though it's not closed under intersection you actually get something that kind of feels like it gives the right answer because of nicole mcgraw zero one law yeah but it's going through some weird steps to get the right answer yeah by the way if listeners i don't know there are a variety of these things called zero one laws and probability theory and if listeners want to think about like knowledge and changing over time i don't know some of these zero one laws are kind of fun to mull over and like think about how to apply how they apply to your life or something all right do you have comments on that claim no i think i think climate grav zero i think that yeah i think zero one law is really interesting and i would recommend recommend it to people who like interesting things all right so back on the finite factored sets they're sort of a way of modeling like some types of worlds or some sorts of ways the world can be kind of are there any like worlds that can't be modeled by finite factored sets yeah i guess infinite ones but um ignoring that for a second so there's this issue that's similar to an issue in pearl where when you we we kind of like when you look at distributions that are coming from a planet effective set or from a dag uh we're kind of looking at probabilities in general position and so it doesn't really make sense to like have a probability one-half or one-fourth although what do you mean by probability in general position um so in both like my world and pearl's world you want we want to like specify a structure yep and then we have all the probability distributions that are compatible with that structure and these kind of like give you like a manifold or something of different probability distributions and then some like measure zero subset of them have special coincidences and when i say in general position i mean you don't have any of those special coincidences and so an example of a special coincidence is anytime you have a probability of one-half or one-fourth or something like that that's kind of a special coincidence but to a bayesian that might happen because of like principle of indifference or something or like to like a limited bayesian the kind doesn't really know what like it feels like principle of indifference kind of advises having probabilities that are rational numbers but probabilities that are rational numbers kind of lead to coincidences in independence that don't arise from orthogonality and so there's a sense in which uh my framework and the brilliant framework kind of don't believe in rational probabilities as something that just like happens or something yeah i mean it's even more concerning because uh if if i think that i'm a computer and i think i assign probabilities to things well the probabilities i sign will be like probabilities that they're like they'll be numbers that are at the output of some computation and there are there are sort of a countably infinite number of computations but they're an unkindly infinite number of like real values so uh so somehow like if i'm only looking at things in general position i'm ruling out like all the things that i actually could ever output yeah so i have a little bit of a fragment of where i want to go with trying to figure out how to deal with the fact that kind of my system and pearl system don't believe in rational probabilities which is to define something and this is like going to be informal well it'll be it'll be formal but wrong to define something that's like you take you take a structure that is a factored set together with a group of symmetries on the factored set that allows you to like maybe swap two of the parts within a partition or swap two of the partitions with each other or swap through the factors with each other right so you can have some symmetry rules so if you uh for example considered like bit strings of length five again you could imagine a factored set that it separates into uh the five bit locations is this bit zero or one but then it also has the symmetry of you can swap any of the digits with each other i could also do swapping zero with one but but for now i'll just think about swapping any of the digits with each other and then it's like subject to the structure that is this factored set and this swapping rule or this this group of symmetries then the set of distributions that are kind of compatible with that structure will not be just things in which the bits are independent from each of each other it'll be kind of any distribution in which the bits are iid so independent and also identical and so you could you could do this you could say well my now my model of what might happen like my like the thing that i'm going to try to infer from my probability distribution is a factored set together with a group of symmetries on that factored set and i mean you're not going to at least naively get the fundamental theorem the same way and so i'm not sure what happens if you try to do inference on something like this but if you want to like allow for things like rational probabilities then maybe something like that would be helpful all right so i'd like to ask a question about kind of the form of the framework so when i think of like models of causality or of time the two prior ones that i think of are like we talked about pearls work on like darker day cyclic graphs and like do operators and such um where you can just sort of draw a graph and also einstein's work on special and general relativity where you have this space-time thing that's like very geometric very curved you know and like you have this uh time direction which is special and kind of different from like the spatial directions those were all like really geometric and included some nice pictures finite factored sets does not have many pictures why not i really liked pictures yeah i mean i think it has something to do with the variable non-realism where it's like it feels like the points or nodes in your pictures or something yeah if i if i take a purlin dag and i'm like there are like 10 nodes in it and even if i assume they're all just like binary facts then now i have well you take 1024 different ways that the world can be and then you take bell number of 1024 different possible variables that i can define on this which is obnoxiously huge and then there's not as much of a useful interpretation of like the arrows that connect them up or something something to do with variable non-realism where there's a sense in which perl is kind of starting from a collection of variables which is a way to factor the world into some small object and because i'm like not starting with that my world is kind of a lot larger i don't know another thing that i that i'd like call a theory of time as people like talk about like time with like entropy that's another example that doesn't feel as visual yeah that's true and i think that that's a lot more variable free and that's maybe part of why yeah it's also the case that once you have variables they have these like relations in terms of their histories and such and you can you could draw them in a dag or something yeah it's like the structure of an underlying finite factored set is like very trivial or something it's like pearl has a dag and if you wanted to draw a finite factored set as a dag it would just be a bunch of nodes that are not connected at all that each have their own independent sources of randomness and then if you wanted to you could maybe like draw an arrow from these nodes to all the different things that you could compute using them or if you wanted to say oh let's just have the basic variables then it's just these nodes yeah but i guess somehow if you want to like talk about the structure that lets you talk about variables but you don't want to talk about variables i guess that's like less amenable to pictures perhaps yeah i mean i don't feel like physics and pearl have pictures for necessarily the exact same reason or something and i'm kind of just like graphs got lucky in the way that they're easy to visualize or something like that yeah that might be right it's also true that graphs are not like simple graphs are easy to visualize but there are a lot of non-planar graphs that are kind of a pain to draw yeah so a related question complaint i have is a lot of this work seems like it could be category theory yep it could be category theory yeah so partitions are basically like functions from a set to a different set and like the parts are just like all the things that have the same value of the function yeah partitions is kind of like the the information content of a function out that you get by kind of ignoring the target and only looking at the source yeah kind of and it seems like there are probably nice definitions of like factors and such and category you know there's lots of duality and category theory has pictures it it's also a little bit nice in that like i i have to admit looking at sets of sets of sets of sets can be a little bit confusing after a while in a way that categories could have some nice language for that so why isn't everything category theory even though category theory is objectively great yeah i mean it goes further than that i actually like know most of the category theory story and i've like worked out a lot of it and kind of went with the combinatorialist aesthetic with the presentation anyway so yeah one of my reasons one reason is because i kind of trust my category theory taste less and i kept on like changing things or something in a way that i was like not getting stuff done in terms of actually getting the product out or something by working in category theory and so i was kind of like oh yeah i'll punch that to the future another reason is because it feels like the system just like really doesn't have prerequisites and by phrasing everything in terms of category theory you're kind of adding artificial prerequisites that maybe make the thing prettier but like you actually like you know what a set is you can kind of go through the all the proofs or something that's not entirely true but like it's like because i'm like working in a system that has very few prerequisites the extra cost of prerequisites is higher like the marginal cost of adding prerequisites is higher another reason was i was just really shocked by the sequence that counts the number of factorizations not showing up on oeis so yeah if you take a if you take an element set and you count how many factorizations there are on the n element set you get a sequence and there's this online encyclopedia of integer sequences that has like 300 000 sequences and it does not have this sequence in spite of having a bunch of lower quality sequences um and i was very surprised by this fact and like it feels like a very like objective test i'm not i'm not a particularly scholarly person it's hard for me to figure out what people have already done and i was just pretty blown away by the fact that this thing didn't show up on oais and so like i kind of stuck with the combinatorialist thing because it had that objective thing for for the purpose of of being able to like do like an initial sell or something okay yeah the the those are most of my reasons i think that i haven't worked out all the category theory but i think it will end up being pretty nice in fact i think that just like even the definition of conditional orthogonality i think can be made to look relatively nice categorically and it's via a path that is pretty unclear from like the definition they give in the talk of the posts but there's an alternative definition that kind of looks like if you want to if you want to do orthogonality and you want to condition on some fact about the world the first thing you do is you take your original factored set and you kind of take the minimal flattening of it so that the thing you want to condition on is kind of rectangular in your factored set and then you where by flattening i mean you merge some of the factors together okay and if you take the minimal factoring and then you ask whether your partitions are orthogonal in the minimal factoring that corresponds to conditional orthogonality and so i think that categorically there there's nice definition here and but i definitely agree about the category theory aesthetic and i think that it actually is a good direction to go here that i may or may not try to do myself but if somebody was super interested in trying to convert everything to category theory i could talk to them about it so speaking of that i'm wondering like what follow-up work are you excited about being done here and do you think that like this kind of development is going to look more like kind of um you know showing nice things within this framework like making that making it categorical or like having a new like showing the decidability of inference like it in final factored sets or do you think it's going to look more like iterating on some of the definitions and like tweaking the framework a little bit until it's like the right framework yeah i mean the category theory thing a little bit does fall into tweaking the framework until it's the right framework although it's a little different i have applications i'm excited about in in both spaces yeah like if i were to list applications that i expect that i'm not personally going to do that like seem like projects that would be interesting for people to pick up one of them would be converting everything to category theory one of them would be figuring out all the infinite case stuff and looking at applications to physics i think that there's a non-trivial chance of some pretty good applications physics that would come out of figuring out all the infinite case stuff because i think that factored sets are actually a lot closer to being able to give you something like continuous time than the brilliant stuff yeah so one would be the physics thing one would be the basically trying to do computational stuff in terms of i kind of just have some like a couple like proofs of concept of how to do temporal inference and like i think you said showing decidability of temporal influence is a thing but really it's like being like i think that i think that somebody should be able to like actually search over the space of like a space of a certain flavor of proof and be able to actually come up with examples of temporal inference that come from this where you you take in some orthogonality data and are able to infer time from them and i think there's a computational question here that i think i might be wrong might be able to at least be able to like produce some like good examples even if it's not like actually doing temporal efforts in practice and so i'd be excited about something like that i would be excited about somebody trying to extend to symmetric finite factored sets which is the thing i was talking about earlier about dealing with rational probabilities i think that of these that i listed the one that i'm most likely to want to try to work with work on myself is the symmetric factored sets thing because i think that that could actually have applications to uh the kind of embedded agency type stuff i'd want to work with but for the most part i'm expecting to myself think in terms of applications as opposed to think in terms of extending the theory and all the things that i kind of said were all forms of extending the theory either by tweaking stuff or by kind of putting stuff on top of it i think it's mostly just like putting stuff on top of it i think i i think that like i don't say that much like sorry i think there aren't that many like uh knobs to twiddle with like the basic thing like you could have some like new orientation on it but i think it'll be like basically the exact same thing i think that the part like the way that i defined it there's only one factorization on a zero element set maybe it would be nicer if there were infinitely many factorizations on zero but like like the definitions might be slightly different but like i think it's it's basically the same core thing i think that i'm mostly thinking that the baseline i have is kind of correct enough for the kind of thing that i want to do with it that i don't expect it to be like a whole new thing i expect it to be like built on top and there's different levels of built on top so i guess i'd now like to pivot into a bit of a more general discussion about your research and your research taste how do you see the work on finite factored sets as contributing to reducing existential risk from artificial intelligence if you see it as doing that i think that a lot of it factors through trying to become less confused about agency in embedded agency which i don't know i have opinions in both directions about the usefulness of this like sometimes i'm feeling like yeah this isn't going to be useful and i should do something else and then sometimes i'm like just interacting with questions that are a lot more direct and noticing how a lot of the a lot of the kind of questions that i'm trying to figure out for embedded agency actually feel like bottlenecks to be able to say smart things about things that feel more direct can you give an example of that i know like uh evan says some things about myopia talks about my or my opi seven keeping her right and like that feels like a lot more direct like trying to like like get a system that's like kind of optimizing locally and not like looking far ahead and stuff like that and i feel like in wanting to like think about what this even means i noticed myself wanting to have a better notion of time and better notion of things about like the boundaries between agent and environment and all of this stuff um and so i'm like i don't know that that's an example of like something that feels kind of more direct like like my opioid feels like something that could be very useful if like uh could be like implemented correctly and could be like understood correctly and i when i like try to think about things that are more direct than embedded agency i feel like i hit the same kind of cruxes and that working with an embedded agency feels like it's more directed at the cruxes even though it's less directed at the actual application of the of the thing in a way that i expect to be useful what's uh well like like in the my rpa example i think like yeah the first past solution would be like look there's just like physical time basically exists and we're just gonna say okay i want i want like an ai system i wanted to care about like what's gonna happen in the next 10 seconds of physical time and not things that don't happen within the next 10 seconds of physical time do you think that's unsatisfactory yeah so i mean i think that you can't really like look at a system and try to figure out whether it's optimizing for the next 10 seconds or not and i think that like yeah the answer that i actually gave with the myopia thing was a little off because i was actually remembering a thought about myopia but it wasn't about time it was about just like time was the thing that i said in in that thing it was more about like counterfactuals and more about like like the boundary between where the agent is or something like that but i don't know i i still think i still think the example works i mean i think that it it comes down to you want to be able to like look at a system and try to figure out what it's optimizing for and if you have the ability to do that you can check whether it's optimizing for the next 10 seconds but in general you don't really have the ability to do that like figure out like what it's trying to do or something like that and i think that yeah how do i get at the applications okay so one thing i think is that in trying to figure out how the system works it is useful to try to like understand what concepts it's using stuff like this and i think that the strongest case i can kind of make of factored sets is that i think that there's a sense in which factored sets is like also a theory of kind of conceptual inference and like i think this can be helpful for like looking at systems and trying or like trying to like do oversight of systems that you want to be able to look at a thing and figure out what it's optimizing for in what ways would you say it's a theory of conceptual inference one way to look at like the diff between like factored sets and pearl is that we're kind of not starting from a world factored into variables instead we're inferring the variables ourselves and so there's a sense in which if you try to do like purlin style analysis on on a collection of variables but you messed it up and instead of having like a variable for what number this like like i have a number and it's it's either zero or one it's also either blue or green and i can also invent this concept called grew which is a green zero or a blue one and instead of thinking in terms of like what's the number and is it blue you can think of what's the number and is it grew and maybe if you're like working in the latter framework like you're kind of using the wrong concepts and you will not be able to kind of pull out all the useful stuff you'd be able to if you were using the right concepts and factored sets kind of has a proof of concept towards being able to distinguish between blue and grew here where the point is that in this situation if the number is kind of independent of the color and you're working with the concept of number and the concept of grewness you have this weird thing where it looks like there's a connection between number and grewness but it also is the case that if i kind of invent the concept of number xor gruness i kind of invent color and color lets me like factor the situation more and see that maybe you should think of it as like the number and the color are primitive properties like we were saying before and grewness is like a derived property and so there's a sense in which like earlier things are more primitive and it's not just earlier things i think there's more than just that but there's a sense in which like because i'm not taking my variables or my concepts is given i'm also doing some like inferring which concepts are good so somehow yeah i guess this yeah it strikes me that like inferring which concepts are good is a related but different problem to inferring which concepts a system is using i don't know like there's stuff that you like to think about that involve like kind of having like separate neurons as part of it and i think there's a sense in which like it might be that we're confused when we're like looking at a neural net because we're thinking of the neurons as like more independent things when really they could be like a transform similar to the blue grew thing from some other thing that is actually happening and being able to have like objective notions of what's going on there like being able to like have a computation and have it like having there be a preferred basis that like causes things to be able to factor more or something feels yeah so i guess i'm concretely pointing the picture of of like factorization into neurons in the result of a learned system might be similar to gru yeah it's interesting in that like like people have definitely thought about this problem but like the although like work on it seems kind of hacky to me so for instance like so i know um chris ola and collaborators now or formally at open ai have done a lot of have done a lot of stuff on using like non-negative matrix factorization to kind of get out the like you know linear combinations of neurons that they think are important and like the reason they use non-negative matrix factorization as far as i might be getting this wrong but as far as i can tell it's because it kind of gets good results sort of rather than like a theory of like non-negativity or something or like a similar thing is um there's some work about trying to exactly trying to figure out whether like the concepts in neural networks are like on the neurons or whether they're like these linear combinations of neurons but the way they do it which again i don't know i'm gonna sound critical here it's like a good first pass but a lot of this work is like okay we're gonna make a list of all the concepts and now we're going to test if neuron has like one of the concepts which i've decided really exists and we're going to check random combinations of neurons and see if they have concepts which i've decided exist and you know which which does better yeah there's definitely something unsatisfying about this maybe i'm not aware of more satisfying work yeah it does seem like there's some problem there and again i think that that like you're not going to be like directly applying the kind of math that i'm doing but it feels like i kind of have like a proof of concept for how one might be able to think of blueness as a statistical property like blueness versus grewness as like a statistical property is something that you can kind of get from raw data and like i don't know i feel like there's a lot of hope in something like that uh but that's also like not my main motivation that was like a side effect of trying to do like the embedded agency stuff but it's kind of not a side effect because i think that like the fact that i'm like trying to do a bunch of embedded agency stuff and then i like like i was trying to like figure out stuff related to time and related to like decision theory and agents modeling themselves and each other and like i feel like i i stumbled into something that might be useful for like identifying good concepts like blue and i think that that stumbling is part of the motivation like i don't know that that stumbling is part of the reason why i'm thinking so abstractly right like like that's not a motivation for thinking about embedded agency that's a motivation for thinking like as abstractly as i am because like you might get far reaching consequences out of the abstraction all right so i guess a few other questions to kind of get at this what do you do like what does day of like scott researching look like uh i mean recently it's been like thinking about presentation of factored set stuff like often involves like thinking in overly for something where i'm just like writing some stuff up and then i like have thoughts as a consequence of the writing often it looks like talking to people about different formalisms and different like weird philosophy yeah i don't know so you're thinking about presentation of this work um what are you trying what are you trying to get right or not get wrong like like what are the problems that you're trying to solve in the presentation i mean i think the large part of the presentation thing is i want to like wrap everything up so that it feels like something that can just be used without thinking about it too much or something like that part of the presentation is like some hope that maybe it can like have large consequences to the way that people think about structure learning but mostly it's like kind of having it be a basic tool that i can then kind of like i've kind of like locked in some of the formalism such that i don't have to like think about these details as much and i can think about the things that are built on top of them i don't know like i think that i don't know i think that like in thinking about this presentation or something is not where like the interesting work is done like i think that like the interesting like like the part that had like a lot of interesting meat in terms of actually how research is done was like a lot of the stuff that i did like late last year which was kind of okay i finally wrapped up cartesian frames what is it missing and it like it largely was i had this orientation that was cartesian frames kind of feel like they're doing the wrong thing similar to or sorry like like all right so here's like a story that i can kind of tell which is i was looking at cartesian frames which is some late some earlier work from last year uh and and part of the thing was you viewed this world as a binary function from like an agent's choice and an environment's choice or like an agent's way of being or an agent's action to the environment's way of being or sorry across the environment's way of being to like the full world state and a large part of the motivation was around taking some stuff that was kind of treated as primitive and making it more derived in particular i was trying to make time more derived and some other things but i was trying to like make time feel more derived so that i can kind of like do some reductionism or something and at the end of cartesian frames i was unsatisfied because it felt like the binary function like the function from a cross e to w was itself derived but not treated that way like when i look at a function from a cross e to w i don't want to think of it as a function i want to think of it as like well there's this object a cross e and there's this object w and there's like a relation between them and then there's that relation kind of satisfies the axioms of function which is like for each way of choosing an a cross e there exists a w but then i also wanted to say well it's not just a function from a cross e to w where a cross e is a single object there's this other thing which is i have this space a cross e and i'm specifically viewing it as a product of a and e and what was going on there was it felt like in my function from a cross e to w i did not just have it's a function not a not a relation i also had this like system of of kind of interventions where i could imagine tweaking the a bit and tweaking the e bit independently and the product like a cross e as an object a cross e it has like the structure of a product and i was trying to figure out what was going on there in a way that the same way that you can view the function as just a relation that satisfies some extra conditions i wanted to view the product as some extra conditions and those extra conditions were basically what kind of grew into me being really interested in in like understanding the combinatorial notion of orthogonality um and so i was like dissatisfied with something being like not quite philosophically right or not quite derived enough or something and i like double clicked on that a bunch okay so another question that i want to ask is so you work at uh miri the machine intelligence research institute and i think among people who are trying to reduce accentuates from ai as a shorthand people often talk about like the miri way of viewing things and you know the thoughts that miri has or something i also work at chai um so chai is the center for human compatible ai and uh sometimes people talk about like the chai way of doing things and that always makes me mad because um i'm i'm an individual damn it how do you think if people are modeling you as like just you know like like one of the merry people like essentially like basically identical to um to abermdemski but with like a different hairstyle uh what do you think people will get wrong yeah so i've been doing a lot of like individual work recently so it's like it's not like there's like i'm i'm like working very tightly with with a bunch of people but there is still something to be said for like well even if people aren't like working tightly together they have like similar ways of looking at things maybe like in a world where they like really understand aberm and the rest of the people yeah i mean i can point at like concrete disagreements or differences in methodology or something i think that everyone i have some disagreements about time like i think that there's a thing where abram i think abram more than like anybody else is like taking logical induction seriously and kind of doing a bunch of work in the in like the field that is generated by an exemplified biological induction and i look at logical induction and i'm like you're just like putting all this stuff on top of time and i don't know what time is yet i need to go back and reinvent everything because it's like built on top of something and i don't like what it's built on top of and so i end up being like a lot more disconnected and a lot more pushing towards i don't know like like i think that like abram's work will tend to be more on the surface directed toward like more on the surface feel directed towards the thing that he's trying to do or something and i will kind of just like keep going backwards into the abstract or something i also think that like there are a lot of similarities between the way of thinking that people and mary share and like also some some large subset of people in a safety in general that like like they're just like a bunch of people that i can kind of predictably expect if i come up with a new insight and i want to communicate it to them it'll go kind of well and quickly not just because they're smart because they're like on the same background like they have like the same there's less inferential gap but yeah people are individuals it's true yeah so speaking of this desire to like uh make things derived like okay where does where does time come from and such what do you think you're happy to see is just like primitive i don't think it's a what i think it's something like taking something that you're working with that you think is important and like doing reductionism on it is a useful tool when you have something that is both like critical like you need to understand this this thing and there's like actual mutual information between this thing that i'm holding and stuff that i care about and also it feels like this thing that i'm holding has all these mistakes in it or has all these like inconsistencies right like it's like why be interested in something like decision theory well decisions are important and also if you like zoom in at them and look at the edge cases you can kind of see they're built on top of something that feels kind of hacky and then it's like a thing that you can do is you can say well what are they built out of like what's yeah you can try to do some sort of reductionism and so it's more a move for when things aren't like clicking together nicely like it's not like yeah i don't think of reductionism is like get down to the atoms i think of reductionism as like the pieces don't fit together correctly go down one more step and see what's going on or something so a related question that is kind of it might be too direct but suppose a listener wants to develop an inner squad like they like they want to be able to like what would scott say or think about such topic just restricted to the topic of like reducing accidental risk from ai what do you think the most important opinions and patterns of thought are to get right that you haven't like already explicitly said so it depends on whether they want an inner scot for predicting scott or whether they want an interscot for just generally giving them useful ideas or something in space as being like a thing to bounce things off of and say i want to like understand x more one question is what would scott say about x it's not actually important that it matches uh and it's more important doesn't generate useful thoughts which is generally what i do with my models of people i like have inner people and then sometimes i find out that they don't exactly match the inner people and i don't care the outer people i don't care that much because their main purpose is to give me thoughts so i want to make it better but it's not like it's not for prediction it's for ideas so i have an inner scot and my inner scott is kind of a little bit being rewritten now because a large part of my inner scotch was kind of identified like identifying with logical induction and i actually like do this for a lot of people i like think about their thought patterns as in relation to things that they've developed and so like if i were to tell that story i would say things like well like part of the thing in logical induction is that you don't make the sub agents have full stories right like a large part of of what's going on in logical induction is it's a way of like ensembling different opinions where you don't require that each individual opinion can answer all the questions you like allow them to specialize and you allow them to like fail to be able to model things in some domains and just like you want them to be able to like track what they're what they fail to be able to model and so i have a lot of that going on where i just like have kind of like boxed fake frameworks in my head where i'm just like very comfortable like drawing analogies to all sorts of stuff and i don't know i like for example wrote a blog post on what does the magic the gathering color wheel say on ai safety or something i'm like i do that kind of thing where i'm just like here's a model it's useful for me to be able to think with or something and i'm not trusting it in these ways but i am kind of like trusting as being generative in certain ways and i keep on working with it as long as it's generative yeah what am i saying i'm saying that i tend to think that if something is fruitful and like creating good ideas but also being wrong in lots of ways i wouldn't say don't mess with it but if messing with it breaks it undo that messing with it and let it be wrong and still be fruitful or something and so i like tend to like work with obviously wrong thoughts or something like that okay another question about like intellectual production is like uh there's this idea of like compliments to something where like you know if uh like compliments to some production process or things that are not exactly maybe i partially mean inputs or you know inputs to process or things you know separate from the process that make the process better so in the case of like scott garabran doing research what are the like best compliments to it i think that isolation's been pretty good actually does that count is that is that a compliment is that the type of compliment i i just realized that i've been kind of confused about what complement is but that's it's at least an input what's been good about isolation i don't know i think i've i think i've just like most i've just like largely been thinking by myself for the last year as opposed to like thinking with other people and it felt like it was good for me for this year or something i might want to go back to something else and why i mean i think there is a thing where i have in the past made mistakes of the form trying to like average myself with the people around me in terms of what to think about things and this is like dampening just because of law of large numbers i guess it's just the it's the heat equation right like everyone averages yeah everyone else and the event you know things become uniform right there's a sense in which like working with other people is grounding in that it like keeps on getting feedback on things but like grounding kind of i don't know grounding has good things and there's bad things associated with it and like one of the ways in which it's like has bad things associated with it is that like i don't know it's like like things can like flow less or something yeah it's funny i exactly know what grounding is in the like in the social sense i recently read a good blog post about it but i totally forgot i mean there is one sense in which like um so if i think about literal physical grounding in electronics like uh the point of that is to equalize your like electrical potential with um the electrical potential of the ground so that you don't build up this big potential difference then have someone else like touch it and touch the ground and have some crazy thing happen but like as long as you have the same potential as the ground it means that there's no like there's not like a net force for charges to like move from the ground to you or vice versa but it does mean that it's like like things can sort of move in both ways and i don't know i i think maybe there's an analogy here of like if you're like something about being averaged with a bunch of people i guess it's sort of it sort of forces you to develop a like communication protocol or like common language or something that like somehow facilitates like flow like flow ideas or whatever and and just directly because like your other people have ideas and you average them into you as part of like or you know some kind of average maybe literal averaging is not right and i'm kind of flattering on does any of that resonate i'm not sure okay so we can move on um is there anything else that i should have asked yeah i mean i have like thoughts about i have like a large face of thoughts which i don't even know what exactly i'd say next or something about like how i plan to use factored sets i think because i think it actually does like differ quite a bit from like the use case in the paper slash video slash whatever blog post uh that's one thing that comes to mind let me keep thinking for more yeah i guess that's the main thing that comes to mind okay how do you plan to use it i mean so one one one piece of my plan is that like i talk a bunch about probability and i don't really plan on working with probability very much in the future like a large part of the thing is i'm kind of pulling out a combinatorial fragment of probabilities so that i can or c common for a fragment of independence so that i can avoid thinking of things with probability or something like that like i don't really talk about probability in cartesian frames and like a lot of the stuff in cartoon frames i think can i hope to like port over to factored sets it's largely like there are lots of places where i'd want to draw a dag but i'd want to never mention probability or maybe i could mention probability maybe i could think of things as sometimes being grounded in probability sometimes not but like i want to draw dags all over the place and i have this suspicion that like well maybe places where i'm drawing dags i could instead like think in terms of factored sets although like i have to admit bags are still useful like i still like like i have this like example where i can infer some time in factored sets that's not the one that i like given the talk and like when i think about it i have like a graph in my head that's like the purlin picture which maybe has something to do with the fact that you're saying that like pearls can be visualized but like it definitely feels like i haven't fully ported my head over to thinking in factored set world which seems like a bad sign but it also doesn't seem like that strong or bad sign because it's new yeah i mean graphs do have this nice like somehow if you want to understand dependence like it's just so easy to say like this thing depends on this thing which depends on these three things and like it's very nice to like draw that as a graph you know yeah i'm more thinking in terms of screening off screening off is like a nice picture when you're kind of like imagining like getting in on a path and you kind of like block the path and you kind of condition on something on the path and then like information can't flow across anymore yeah so like a variable screen something off like it's yeah can you just say like what screening off means i mean the way i'm using it here i'm mostly just saying like x screens off y from z if y and z are orthogonal given x where orthogonal could mean many different things it could mean like thing graphs it could mean independence it could mean the thing in factored sets yeah yeah i guess like the idea of paths and graphs gives you this kind of nice way of thinking about screening off yeah so i i do feel like i can't really picture conditional orthogonality as well as i can picture d separation even though i can give a definition that's like shorter than the separation captures a lot of things so yeah speaking of cartesian frames so cartesian frames is some i guess a framework you uh worked on as you said last year and one thing that existed in cartesian frame frames was it had this like notion of um sub-agency where like it was if you had an agent in an environment you'd kind of talk about what it meant to view it as like somehow a collection or a composition of sub-agents so yeah this this question we're just gonna assume that listeners basically get the definition of that um and you can skip ahead to the last question if you want to look that up or don't want to bother with that but i'm wondering like um so these finite factored sets it's kind of easy to see how like like you know the world being a product of the agent and the environment that's kind of like this factorization thing i'm wondering like how you think the um the notion of uh this sub-agents thing goes into it because that was the thing i was like kind of it was kind of the most interesting part of cartesian frames yeah so i think that i actually do have to like say something about the definition of sub-agent okay to answer this which is like in cartesian frames i gave multiple definitions of sub-agent and one of them was kind of opaque but very short and i kind of like justified like i kind of like mutually justified things like ah this is pointing to something you care about because it agrees with others definitions but it's really carving it at the joints because it's so simple so like it shouldn't be clear why this is a sub agent but in cartesian frames i had a thing that was like c is a sub agent of d if every morphism from c to bot factors through d and when you think about thought there's a sense in which you can kind of think of bot as like the world because bots is the thing where the agent kind of just gets to choose a world and the environment doesn't do anything and so you can view this thing as saying every morphism from c to bot factors through d and i think this translates pretty nicely it's not symmetric in the cartesian frame thing but i think it translates pretty nicely to d screens off c from the world c is the sub agent of d means that d screens off c from the world in the cartesian frame thing it's not a symmetric notion when i convert factored sets it becomes a symmetric notion maybe there's something lossy there and by screening off you mean you mean this conditional orthogonality i mean conditional orthogonality i'm saying factoring through like saying a function factors through an object is like similar to a screening off notion and the way that i define sub-agency in factored sets looks like this so i can say more about what i say about the world the world is like maybe some high-level world model that we care about so we have some partition of our finite factored set w which is kind of representing stuff that we care about and we have some partition uh that's maybe like we have some partition d which corresponds to like the super agent and like the choices made by the super agent and we also have some partition c which corresponds to the choices made by our sub agent and so like you could think of like maybe d is like a big a large team and c is like one subpart of that team and if you imagine that like the large team has this channel through which it interacts with the world and that d kind of represents the output of the large team to the world but then internally it has some like internal discussions but those internal discussions never kind of leave the team's internal discussion platform or whatever then there's a sense in which if i want to if i know like like if the team is like a very tight team and c doesn't really have any interaction with the world besides through the official channels that are d then if i want to know about the world once i know about the output of d learning more about c doesn't really help which is saying that like c is orthogonal the world given d how is that symmetric because normally like if x is orthogonal to y given z it's not also the case that y is orthogonal to z given x right uh sorry it's a metric with respect to you're not you're not replacing the given oh okay by symmetric it's symmetric with respect to swapping yeah when i said symmetric obviously i should have meant yeah it's metric with respect to swapping c with w and d is in the middle which yeah what does it mean to swap c with w that's kind of strange it's capturing something about being a sub agent means that kind of the interface of the super agent is kind of screening off all of your stuff and like one way to see this is if we if we weren't working with like like i was thinking this in terms of like w's everything we care about but if we weren't thinking about w is everything we care about if we just took any partition x and any other partition y and we let c be equal to x and we let d be equal to the common refinement of x and y then d screens off c from the world so if you take any two choices any two partitions and you just put them together you get a super agent under this definition and you can kind of combine any partitions that are kind of representing some choices or something maybe and you can combine them you get super agents but that's like that's like super agent with respect to like the whole world and as you take a more restricted world now you can have sub agents that are not just one piece of many pieces but instead maybe the sub agent can have some internal thoughts that don't actually affect the world and are not captured in in the super agent which is maybe only capturing like some more external stuff yeah i guess like but one thing that comes to mind is that like yeah so we have this weird thing where your kind of definition of a sub agent um you could swap out the like um sub agent with the rest of the world because we were thinking of an agent as like a partition like probably not all like just the choice of an agent maybe as partition but probably like not all partitions not all variables should get to count as like being an agent right and i'm wondering if there's like some restrictions you could place on like what counts as an agent at all that would like break that symmetry yeah you could i don't actually have a good reason to want to here i think like i think that that part of what i'm trying to build up is to not have to make that choice of what counts as an agent what doesn't like i don't know i can define this sub-agent thing and i can define things like this partition observes this other partition so like a embedded observation which i haven't like explained and i feel like it's useful that i can give these definitions and they extend to these other partitions that we don't want to think of as agenda either i actually feel a little more confused in the factored set world about how to define agents than i did before the factored set world because if i'm like trying to define agency my kind of go-to thing is like agency is time travel it's this mechanism through which the future affects the past through the agent's modeling and optimization and now i'm like well part of the point of factored sets is i was trying to actually understand the real time such that time travel doesn't make sense as much anymore and one hope that i have for saving this definition and thinking about what is an agent in the factored set world is the factored set world leads to multiple different ways of defining time and so just like we want to say an agent has some there's some sort of internal to the time there's some sort of internal to the agent notion of time where it feels like the fact that like the fact that i'm going to to eat some food causes me to drive to the store or something so like internal the agent there's some time and then there's like also the time of physics and so one way you can think of agency is where there's kind of different notions of time that disagree and there's a hope for having a good system of different notions of time in factored sets that comes from the fact that we can just define conditional time the same way we define conditional orthogonality we just imagine taking a factored set taking some condition and now we have a new structure of time in the conditioned object and it might disagree and so you might be able to say something like agents will tend to have like different versions of their time disagree with each other and this might be able to be made formal i don't know if this is a vague hope all right cool maybe that gives people ideas for uh how to extend this um or for work to do so um yeah i guess the the final question i would like to ask is um yeah if people have listened to this and they're interested in following you and your work how should they do so yeah so specifically for finite factored sets everything that i've put out so far is unless wrong and so you could google some combination of my name scott garabrant and les wrong and finite factored sets um and i intend for that to be true in the future i tend to keep posting stuff unless wrong related to this and probably related to future stuff that i do yeah i tend to have big chunks of output rarely so i think that yeah like i i haven't posted much on less wrong since posting cartesian frames and i'm currently planning on posting a bunch more in the near future related to factored sets and that'll all be unless wrong all right scott thanks for being on the podcast and listeners i hope you join us again thank you this episode is edited by finn adamson the financial costs of making this episode are covered by a grant from the long-term future fund to read a transcript of this episode or to learn how to support the podcast you can visit accerp.net that's axrp.net finally if you have any feedback about this podcast you can email me at feedback axrp.net
Related conversations
AXRP
7 Aug 2025
Tom Davidson on AI-enabled Coups

This conversation examines core safety through Tom Davidson on AI-enabled Coups, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -5 · 133 segs
AXRP
1 Dec 2024
Evan Hubinger on Model Organisms of Misalignment

This conversation examines technical alignment through Evan Hubinger on Model Organisms of Misalignment, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -7 · 120 segs
AXRP
11 Apr 2024
AI Control with Buck Shlegeris and Ryan Greenblatt

This conversation examines technical alignment through AI Control with Buck Shlegeris and Ryan Greenblatt, surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med -6 · avg -9 · 174 segs
Future of Life Institute Podcast
7 Jan 2026
How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann)

This conversation examines core safety through How to Avoid Two AI Catastrophes: Domination and Chaos (with Nora Ammann), surfacing the assumptions, failure paths, and strategic choices that matter most for real-world deployment.
Same shelf or editorial thread
Spectrum + transcript · tap
Slice bands
Spectrum trail (transcript)
Med 0 · avg -3 · 85 segs
Counterbalance on this topic
Ranked with the mirror rule in the methodology: picks sit closer to the opposite side of your score on the same axis (lens alignment preferred). Each card plots you and the pick together.
Mirror pick 1
TED Talks
18 Dec 2023