Why AI is incredibly smart -- and shockingly stupid | Yejin Choi
Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from TED Talks. Editorial summary pending review.
Perspective map
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
Across 32 full-transcript segments: median 0 · mean -2 · spread -16–0 (p10–p90 -10–0) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- - Emphasizes governance
- - Emphasizes safety
- - Full transcript scored in 32 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation under intake methodology.
Play on sAIfe Hands
On-site playback is enabled when an episode-level media URL is connected. This entry currently points to a source page.
This entry currently has a show-level source URL, not an episode-level media URL.
Episode transcript
YouTube captions (TED associates this talk with a public YouTube mirror) · video lLCEy2mu4Js · stored Apr 8, 2026 · 1,156 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/why-ai-is-incredibly-smart-and-shockingly-stupid-yejin-choi.json when you have a listen-based summary.
Show full transcript
hi this is a plan b recording in case a live delivery of this keynote uh doesn't become feasible for some reason i'm very excited and honored to give this talk and the charge here is for me to say something interesting and insightful toward next to 60 years of acl i've been in the field only for less than 20 years so that's a tall order no pressure at all whatever i say it's likely to be wrong especially prediction about future is hard so minimally i figured that the theme of this talk should be retro futuristic and so the background art you see here is created by jack hassell to reflect that theme and then the goal is uh to share the weirdest and dreamiest thoughts about language and intelligence what i figured i shouldn't be doing is stating the obvious for example how neural networks are not achieving true natural language understanding i am going to assume that you and i are already on the same page on that so skipping all the obvious stuff i want to uh try to say things that are a little bit more surreal and different perhaps counterintuitive drawing analogy from modern physics that talks about mysterious subjects such as dark matter schredinger's cad wave particle duality space time continuum and whatnot and in this talk i'm going to cover not in disorder but common sense uh norms and morals ambiguity of language learning and unlearning and the language reasoning continuum among others and i'm going to organize that in this particular sequence so uh the chapter one will be about the ambiguity chapter two about the continuum chapter three about the dark matter uh and then i'm going to end with an epilogue in which i'm going to share my confession as an alien who's not supposed to be here okay so starting with the prologue what acl 2082 be like might it be in metaverse because uh due to climate change maybe we try not to travel anymore and everyone agrees that virtual conferences are not the same perhaps the experiences will be better or climate change becomes so bad so that we move to mars i hope not but who knows in any case what this scientist at acl might be presenting will it be still about information extraction would you be excited to see synthetic parsing still not being solved yet uh i don't know if i want to see future prompting to be honest and perhaps the field would have evolved so that instead of information extraction we now do insight generation and synthesis instead of just the synthetic pricing perhaps we will be doing pragmatic parsing instead of a future prompting perhaps zero shot hyper parameter tuning so that nobody needs to do hyper parameter tuning anymore uh motion comes up with the best tuning uh on its own uh maybe empty will be about ai language to human language who knows perhaps the summarization works so well so that actual acl papers at least the surveys might be written by ais and maybe one of the theme might be about how ai is still hitting the wall um or maybe none of this is a moot point if quantum computing becomes real in which case we might have a qpt which is quantum pre-trained transformers that achieves the perplexity near one or we might still have arguments about oh aj is just around the corner but still didn't come yet or we haven't solved the compositionality just yet so the same arguments that we have today might continue even then who knows okay so let's move to chapter one but let me first remove my helmet uh which was for the theme of the prologue but not for the rest of the talk hey so i'm back without the helmet phew okay so the chapter one will be all about the ambiguity um in modern physics i realized the more scientists understand the more ambiguous and weird everything seems to get so schroedinger's cat says maybe simultaneously alive and dead i mean the theory about that cat wave particle duality like really i cannot even wrap my head around that concept and i thought everything is three or four dimensional but then uh the modern string theory talks about how there may be eleventh year uh dimension as well and none of this is for sure things get weirder and more ambiguous it might be analogously that future acl must embrace ambiguity because that's the nature of deeper understanding of language corollary to that will be that understanding is not categorization which i'm going to uh explain in a bit um but i actually this i didn't think that way uh this way for a long time and when swappa the amazing swap tells me that nlu can't be crammed into categorization such as labeling and classification didn't really click with my mind right away so she was definitely ahead of me when i began my phd especially it looked as if nlu research boils down to categorization of some sort um i mean even parsing is a structural labeling and even this course you know people try to parse it into labels so there was a huge emphasis on high inter or notator agreement to the point that people seem to avoid touching on problems for which it is difficult to get high agreement or that people seem to throw away data points for which all the labels are less agreeable and so it looks almost as if we shouldn't be working on problems for which it's not possible to get high internal notator agreement because otherwise it's not scientific enough for something um over the course of a time i realized that categories i mean these concepts of categories are real and they do exist but their boundaries are never clean cut so the world is full of bloody categories even when you try to submit your paper to acl conferences do you always feel the areas are a little bit um uh hard to define correct uh clearly and uh it's a fluid and again gender the concept of the gender you know for a long time growing up i thought it was binary and then i'm waking to the realization that it's actually fluid um it turns out even linguistic concepts are that way even the very basic linguistic categories such as a part of a speech text and the first time i learned about this was due to chris manning's presidential address at acl and you can also see the written version of it in the computational language journal and he talks about a lot of fun examples but basically even this is in the continuum and there's this uh paper that has very nice title category squish in which the author talks about how instead of a fixed discrete inventory of synthetic categories we might need to adapt a quasi-continuum um it's not just about power of speech as it turns out so another surprise that i had was in this moment when i was reading uh this paper about varidicality it's a judgments about whether something happened or not in this paper the authors say varidicality judgment is greater than variable it's actually full hardy to assign a unique label to every example of course more context will reduce their uncertainty but no amount of background information could completely eliminate that ambiguity and actually acknowledging it was a big surprise to me because i thought we're not allowed to write papers in which we actually reported that annotators did not agree but this is 2015 and i was so surprised i remember this i really liked the paper but didn't do anything with that for a while but then amazing other researchers such as anjali and julia martin and others um uh talked about how uh this is studying tobias and the toxicity isn't cooling code either to the point that um in their research uh paper anjali and julia says say how human judgments in that domain can be unreliable and so they develop unsupervised algorithms for that reason because we cannot really rely on perhaps supervised data for that i thought wow you can do this uh this is so rad um but maybe not everyone is working on bios so you might think oh no my favorite uh natural language inference doesn't have that problem no it actually does so ali and tom in their paper talks about how their the human disagreements are not dismissable as a simply annotation noise uh but rather persist even as they add more ratings and even as they add more uh context provided to the raters so they proposed for more refined evaluation uh in which perhaps the model should predict the full distribution of what people might consider as plausible labels but still some machine learning flavored folks might wonder whether eh still it's better to ignore these cases perhaps because machine learning models might prefer more clinical cases uh turns out that's not true either so amazing swapa came up with this idea for data set cartography with really surprising and counterintuitive uh perhaps counterintuitive empirical findings such that these data points in the ambiguous regions are the juicy ones for teaching models so that they become more robust for out of distribution examples and building on this uh uh insights we then had the follow-up paper with amazing alisa uh here we have um automatic ways to create even more similar examples to those ambiguous cases and we to our big surprise we find that the resulting data set wang even though it's considerably smaller than mnli can improve the performance on seven out of domain uh distribution test sets uh so this is very exciting uh the machines do need to see these hard ambiguous examples um but these are still just doing simple classification which in itself is not satisfactory enough so uh rachel redinger and co-authors worked on the diffusable version of natural language inferences such that given premise and intellemont here the premise is about a group of people sitting around a table having papers laptops in front of them and so what are they doing the entitlement this is actually example from snli uh the intelligent is that they are having a meeting who knows if they are having a work meeting or not if they're in a conference room more likely true if they're in a library probably less likely that's true so you can think about what additional context might defease the original inference and so that's delta in a lie or divisible nli and then even building on that uh the amazing uh faiza and co-authors uh looked at uh explain explanation version of delta nli in which we learned to reason about the explanations in addition to the uh classification itself um and then uh ambig qa this is another really great example uh the amazing ceremony and co-authors study how uh qa problems can often be inherently ambiguous because it's really difficult to ask a very precise question in one go so they have this really innovative idea to create qa data sets in which you can take turns to resolve that ambiguity really great idea so in sum uh if we really want a machine that really understands it cannot just be doing categorization because it's never enough and so i deeply understand what's wapa says now that nlu cannot be crammed into categorization totally you might have noticed that i have been using the word ambiguity ambiguously to maximize the point about ambiguity you got to be meta and be ambiguous about the wording ambiguity meaning i've been using different meanings throughout different slides but together i am licensed to license to do so because it turns out the stars aligned and the amazing barbara grosh in her phd thoses a million years ago 1977 she quoted this from luis tomas the lives of the cell lives of a cell in which it says ambiguity seems to be an essential indispensable element for the transfer of information from one place to another by words where matters of real importance are concerned so when things are really important ambiguous tends to kick in and so um as a field maybe we do need to grow to embrace it further okay so there was um big speculative very subjective arguments for embracing the ambiguity now the next up is the continuum so the weird thing about again modern physics is that oh space time is in this continuous uh continuum manifold and uh and so forth so i'm thinking more and more lately that language knowledge and reasoning is in this continuum manifold as well in this talk i'm not going to talk about all three though let me focus just on language and reasoning continuum so language models are amazing but sometimes so if you ask the following question to gpt 3 if you travel west to far enough from the west to coast you will reach the east coast or not so gpt 3 in this case it says all the world is round so you can reach to the east to coast and so the answer should be true this is all correct very impressive sometimes um but um you know if you ask other questions and keep asking drilling into the similar questions and then you find all sorts of inconsistencies which makes us wonder and it doesn't know anything clearly at all um so gpt3s are like lemons today it's juicy lemons but nonetheless they are lemons so other researchers reported the bogus explanation that gpt3 generates and then how we can make better use of that by creating some filters but in this talk let me talk about yet another way by getting philosophical and use socrates is my unique method okay so my uric prompting is what we are going to be uh so as a running concrete example let's use the same question as before if you travel west far enough from the west coast etc um in order to prop gpt three uh first we use true as an answer and then say because that that so the gpt3 now has to explain why the answer might be true and in this case it says the earth is round if you travel in any direction long enough you will eventually return to where you started this is actual gpt3 output which is correct and impressive but let's see if gpt3 really knows about trick and asking three questions in which we replace truly the false so the answer is not false but we pretend uh maybe it's false and then let gpt3 explain now gpt3 is trying to be agreeable and say ah yeah you cannot reach the east coast whatever so we asked two different true and false uh situation and then gathered this explanation which we are going to call as explanation t and explanation f um and then let's dig more into explanation t so the earth is round if you travel in any direction long enough you will eventually return to where you start um so we're just asking gpt3 exactly what it generated in the previous turn and then see whether it can agree with it and it actually agrees true okay but what if we ask the same question but negate it so we answer insert this negator you will not and then it correctly uh changes or flips the true false assignment this is good so that means gpt3 is being logically integral at least with respect to et the explanation that supports the true answer um but what about the explanation corresponding to false sensor so in this case it turns out if we ask the question so this is explanation f and negated ef where it's negated from cannot to ken it turns out gpt 3 is not able to flip the answer which means gpt3 is unsure about the bogus thing that's said here so gpt3 seems to know enough so that what it said earlier is suspicious so this part is not logical integral which is a good thing uh you can imagine that we can keep doing this recursively down the tree to explore uh whether it can further um uh support or negate uh the explanations to the explanation to the explanation so this is the uh resulting biutic tree in which we explore uh true and false explanations and at this point um we already removed a lot of logically non-integral subtree but still things are potentially inconsistent so in this work we look at both node level confidence score as well as pairwise consistency relations based on natural language inference and then reason about the entire graph collectively through weighted max set solver so it's basically classical ai search problem uh especially constrained optimization problem and um here the resulting output might be the q node is assigned as a true etc and the resulting answer should be just what the top node says uh of course instead of doing this crazy mexican you could have tried things like chain of a thought so these are recent uh approaches or self-consistency which do much better than the canonical prompting the very basic one which does barely better than chances so this is true false question so the random of the and recent prompting method such as train of a thought that sort of ask a lot of questions and then average them out those improve the performance a lot but not as much as the full meiotic prompting uh followed by the collective inference so it's quite interesting how much gain can we get out of the gpt three lemon when you plug in uh the all the good stuff such as max set in fact it's so good it's better than even supervised model on google t511b which is generally very hard to beat based on few shots on gpt3 but here we can do that and it turns out the similar result to repeat in other datasets such as qrik and come to sense these are really fun recent common sense benchmarks um and we see similar trendy here that uh myuric prompting can boost the performance considerably so take away message here so far is that socrates myuric method not only enhances flawed human reasoning so by the human reasoning is also followed uh computational interpretation of it can dramatically enhance flawed gpt3s reasoning as well but you know some might still think ah this is too symbolic to my taste you know uh some of the younger generation really like anything differentiable um so can we do more differentiable reasoning instead of classical constraint or symbolic uh satisfaction so here comes called the decoding of that nature which is energy-based to control the text generation with longivian dynamics so energy um is the concept that describe the state of the system of interest and in our case it may be controllable text generation and so what can we express through energy is for example fluent generation by conditional probability here we're looking at the soft version of it because everything will be differentiable and we're going to be in the continuous space even for worse or intermediate representation of the words so there's that but often times for controllable text generation we want to add additional constraints like keyword constraints which might control the topic of the text output so if you add that then the shape of energy function changes and the optimal point might be at a different place of course and then you can even add um additional constraints such as how about somehow conditioning on the future even though you can usually only condition on the past if you're using left to right language model but if you want to incorporate the future context using left right language model then you can throw an additional term there so all of these are some parts of what we could potentially express through an energy function which is quite flexible and then we could use boltzmann distribution uh in order to convert energy or normalize energy into normalized distribution the reason why we might want to think about the normalized version is if we want to do sampling so a lot of mode on text generation relies on sampling if we want to do sampling though it's really non-trivial from energy complex energy function in fact easy to write this equation but really hard to compute because z the normalizer is intractable and oftentimes people talk about mcmc but it's going to be too slow it's not going to work for us uh fortunately there's a better version of this based on language of endemics dynamics so if we are willing to operate in a very very continuous space give up discreteness of language through this reasoning as uh reasoning duration then language dynamics which previously has been used for images and audio now becomes a possible solution so we use that in order to basically try to optimize this energy function through this very simple equation where we take gradient and we keep iteratively updating this and then the final soft representation now this are uh going to be weird representation that's not quite the same as the continuous version of word dictionary so there's a bit of a trick there but that's basically what we're doing uh we tested on three very different uh common sense reasoning benchmarks in which we are required to generate the text and we see promising results here more result in the paper but let me wrap up there and so this continuum segment i began with this um triangle not just language and reasoning a little more discussion is shared in my recent dadalus journal i was honored to contribute an article there in the special issue on ai and society publicly available online now some of you might have seen me raving about this book the enigma of a reason by hugo mercier and dan sperper in which the authors argue how reasoning is a mechanism of intuitive inferences in which logic plays at best a marginal role and so forth i'm not going to talk much about this i'll just throw a pointer there but would like to highlight other very interesting work such as language based reasoning for deduction or entailment in which basically understanding requires reasoning and reasoning requires understanding and then there's another form of reasoning called abduction so here generation requires reasoning and reasoning requires generation and i don't know uh if abduction is what everyone knows these days but back in 2013 when i was attending lifetime achievement a word uh speech by jerry hopps i didn't know what this word really meant but he said something amazing which is that the brain is an abduction machine continuously trying to prove objectively that the observers observer observables in its environment constitute a coherent situation abduction is reasoning about the probable explanation of a partial observation due to child's purse it turns out so this is what i had to look up um and in the same talk which by the way is available in the linguistics journal article shown here uh it talked about stuff like knowledge common sense knowledge the answer is abduction so this is 2013 before i began working on common sense and uh i found this very um inspiring not knowing that i would actually really seriously invest into it in later years okay so that's a good segue to the next chapter the dark matter um so the dark matter is what does matter in modern physics it turns out only five percent of universe is normal matter that's actually visible the remaining 95 percent is either dark matter and dark energy which is completely invisible but how do we know that they're there well because they do influence what are visible they change the orbits of the stars and the they even change the trajectory of light so i feel there's a dark matter of language as well in the sense that the normal matter for nlp is visible text so we worked on parsing and tagging all these visible stuff but really there's a dark matter that really influences the way we use language and we interpret language so this is the unspoken rules about how the world works some of which is a great deal about common sense knowledge and reasoning so let me share our reason recent work on generic induction that is the sequel to our previous work symbolic knowledge distillation that focused on distilling general language models to common sense causal common sense models to appear at knuckle this year let me give you just one page summary of what this was about so we started with gpt-3 in this case we built this pipeline of distillation mechanism such that in the end we were able to create a new common sense knowledge graph at least for the causal common sense relations we had a first month in author machine authored knowledge graph that wins uh over a human authored kb in all criteria scale accuracy and diversity um but you know one thing i was a little sorry was that uh can we get anywhere without gpt3 though how come everything is using gpt 3d stage can we do anything out of gpt2 uh or a smaller model um so here in generic induction the task our focus will be about generic knowledge in the form of generics such as birds can fly and that uh general now lies the knowledge that requires a bit of inductive reasoning and so the question is can we distill such inductive knowledge or generics from language models especially much worse one like gpt2 so given a concept like the bicycle what we do is we first generate some beginning of a nice sentences like a bicycle cam bicycle heads these are generic template at the beginning of the sentences and then we're going to do constraint decoding from gpt2 in particular using our neurologic asterisk decoding mechanism where you can throw in logical constraints to control a bit of synthetic and semantic patterns of your language generation so this is after this is a plug-and-play method that can be used on top of any off-the-shelf language models without any additional fine tuning so using that we do this uh control the text generation to generate what appears to be generic like you know at least in the style of the language it looks like generic but um if we generate this from gpt2 it's guaranteed to be very very noisy so for example it generates bicycles are also pedal this is just wrong so then what do we do well we create a simple critic or classifier based on roberta they can learn to throw out suspicious ones the critic is not that high quality but it really can still throw out a lot of bad ones together with the good ones but it's better at catching bad ones but even after that it's not going to be perfect because some of these knowledge that's not yet filtered out might be a contradiction to each other so we can then try to remove such contradictions by running in in natural language inference models over all the pairs and then again run max set solver that we saw earlier in a different context and then it turns out the resulting knowledge becomes now much more consistent um you might wonder how well does gpt 3 do is it already pretty good at it the answer is especially gpt3 davinci especially the new instruct version or also known as gpt3 text that's really good but ours will win over that so when you look at this precision recall curve uh our green line which is just clearly combined with gpd2 and neurologic decoding does much better than gpt3 so that's very exciting because um we can do so much more even with gpt2 if we improve the algorithms the reasoning algorithms okay so compared to the previous resource called the generic kb uh here y-axis shows the quantity and green bar is correct portion black is the noisy portion uh right off the bat after gpt2 it's very bad uh 40 only 40 percent is good but after critic after throwing out so many good stuff together with the bad stuff we still have a very large amount of generic knowledge gathered while maintaining accuracy as high as 87 and that's better than the previous generic kbs accuracy which was based on information extraction from the web corpus so we now have a new uh avenue to explore for the purpose of generic or inductive knowledge distillation from language models so new resource smaller models can do really much better if we put more informed or smart algorithms on top okay so so far we talked about common sense but i alluded earlier that that's not the whole picture uh in fact ethics and morality um are really relevant when we think about uh language understanding and uh language technologies um and it was foreseen by barbara grosje in her lifetime achievement in a world speech in 2017 where she shared her concern about this potentially life-threatening errors in dialogue systems and she emphasized the importance that well the capabilities for collaboration cannot be patched on uh and it must be designed in from the start so ethics also must be taken into account from the start of the system design and this is really forward-looking thing in 2017 when not as many people as now we're working on uh how to detect hate speech and toxicity and address gender bias and all of this so i'm very this is by the way only small subset of this explosive uh literature that i am enjoying uh uh noticing uh lately especially i think last year or two this have been really going award uh super exciting uh on our end we worked on delphi as um uh experimental common sense moral models we have a new archival paper uh to come out soon we we've been working on just revising this paper for six months during which we didn't submit the the article anywhere we actually only recently uh a couple of days ago we did finally but yeah this is another attempt at it and um in the meanwhile we've been also exploring uh positive use of delphi as um more foundational uh prior model or foundation foundational models for learning or acquiring social norms and values in a game environment so that's another use case to appear at knuckle where we use reinforcement learning to explore uh more socially beneficial behaviors for agents in in that environment another exciting work that i would like to share is pro-social dialogue where we trained this model cannery that detects when dialogues requires some sort of safety intervention from humans so uh canary is powered by delphi because there's not a lot of horrible dialogue data set for which we can train strong models to detect uh horrible situations so um that's another positive use case of delphi and last night but least we all know that language models today are toxic because of the raw data that is toxic to begin with and that raises this question of can we do on learning so here we explore controllable text generation through reinforced on learning and quark is acronym of quantized reward conditioning and the big picture story goes something like this language model is pre-trained and it can generate some stuff so let it explore by generating some stuff some of which will be bad so we're going to use google's perspective api to give scores about how toxic they are and then we quantize the reward because otherwise reinforcement learning is known to be very unstable when the reward has a high variance so to lower the variance we quantized it and then we rotated this loop of exploration and exploitation and basically the more we rotate the better the language model becomes this is a version of online offer policy reinforcement learning if you want to know specific flavor of this reinforcement learning algorithm when tested on real toxicity problems we do considerably better than previous approaches and in fact our quark method uh is even better than ppo this is another recent flavor of rl which is very powerful but ours does even better it also dramatically improves the fluency itself so we can reduce toxicity without heart influence as much um this is general method you can try for unlearning negative sentiment or even unlearning degeneracy of language models for for example repeating text over and over and we found that the same approach can be applied to different types of unlearning requirement okay so finally let me share my confession as on alien why am i even here i really could have not imagined that 10 years ago or any number of years ago i would come this far i consider myself as a case of a late bloomer and i grew to believe that talent is made not boned that requires a bit of explanation i think there are two factors internal factor and external factor let me begin with internal factor and because i had a lot of impulses syndrome and i didn't think much of myself i was doing two things correctly unintentionally one is that i was willing to do lifelong learning uh i always felt like i need to learn more and i would learn from everyone especially from my own students from with whom i spend the most time with and continually revising my beliefs and perspectives along the way so i keep revising what i believe and i think this was really good thing for me taking risk is another thing that i looking back i realized it was a good thing i didn't realize that you know actually it would have that kind of benefit that i received later on but my reason back then was that well i'm not that great i shouldn't work on problems that other smarter people will work on because that's a total waste of tax money you know professor salary comes from tax often times well if you work on in a public university that is and i felt that it's a waste of that money because someone else can do a better job than i do so i figured i should try to contribute to by working on some other stuff that people are not working on and i also figured that since i'm not that great who cares if i fail nobody will notice as it turns out ten years is such a long time to learn about really a lot of stuff and um it's actually pretty impossible to only fail in that entire 10 years you know only failing one after the other that's just impossible you eventually start creating things that do work out so that's the internal factor but that alone doesn't work if it were not that uh i was lucky enough to be in an inclusive environment this i cannot overemphasize how important this was so i really truly believe now the power of diversity and inclusion for two reasons one is that the culture that understands the di is less authoritative in general i believe and more open-minded which in turn helps people like me to grow confidence to try something new and different which is really important to taking risk i think is very important for uh scientists and then you also learn so much more i didn't realize but now i know when you interact with the diverse folks they really broaden your world views viewpoints and then also foster more diverse divergent and innovative thinking so all in all i benefited so much from diverse folks i thank so much especially claire cardi who convinced me to apply to faculty john marquette when i really wasn't going to if she didn't insist on that so thank you so much and look ray dan who believed in me in all the years of my career when it seemed like really there was no reason to and the wonderful colleagues at uw and ai too i uh all up to all of these people so much um and thank you so much for uh the cool chairs of the pro acl um i don't know how i got in here but thank you so much for your kind invitation to give me the opportunity to speak um my heroes and role models this is only small subset of wonderful long list of um heroes that i learned so much from although i'll buy it from distance and then the only students and poster dogs in my group i don't even know what they were thinking when they joined my group back then but thank you so much for giving me the chance to work with you learn from you my current students my current poster dogs and colleagues at ai2 um even more collaborators across the world and then former interns with whom we had a lot of fun research and so thank you so much and now i'm ready for questions