Generative AI in games using large language models (LLMs) - with Hilary Mason and Jon Radoff

Originally Broadcast: April 12, 2023

Hilary is CEO of Hidden Door, a gaming startup that is using language models to create new types of play. Previously, Hilary was head of machine learning for Cloudera; the founder of independent research lab Fast Forward Labs; Chief Scientist of Bitly; and was a professor of computer science at Johnson & Wales.

In this conversation, I spoke with Hilary Mason about how language models may change interactive experiences like games, finding new forms of playfulness, details of the enabling technology, emergent behavior -- and even the very nature of intelligence and consciousness.

For the complete show notes and transcript for this episode, please visit: https://meditations.metavert.io/p/generative-games-with-language-models

00:00 Introduction
00:50 Using Generative AI in Games
05:00 Using GenAI for new game genres
08:30 Using game design constraints
13:20 Giving LLMs a memory
18:00 Training model for specific worlds
20:30 Evolution of AI/ML technology
22:50 Hallucination as a Feature
26:20 Training models at Hidden Door
26:30 Smaller vs. Larger models and economics
30:30 How UX unlocked ChatGPT's value
31:24 What is Intelligence? Sparks of AGI
34:20 X-risk and Ethics
36:42 Emergent Behavior in Games & AI
42:30 Consciousness
44:04 Reinforcement Learning, Diplomacy
45:50 What did ChatGPT predict about us?


Hilary Mason: The weakness of language models is the hallucination,

Hilary Mason: the inability to understand that

Hilary Mason: and the lack of symbolic logic. If you think about it in a creative setting as a tool for expanding creative play, that actually becomes a huge asset.

Hilary Mason: I'm with Hillary Mason, the CEO of Hidden Gorg.

Jon Radoff: Previously, Hillary was a professor of computer science, the chief scientist of Bittley in the head of machine learning at Club Air. If Hidden Gorg Hillary is building games and he used generative AI, Hillary, welcome.

Hilary Mason: Thank you. It's really delightful to be here this morning.

Jon Radoff: Hillary, I really want to talk about the way you're using generative AI in gaming. Most people are currently thinking about it as part of the production process. That's right. Part of the content. Yeah, we're actually using AI in the game itself. So can you elaborate on that and just sort of maybe zoom out the camera lens a little bit? What did you set out to do with Indy?

Hilary Mason: Absolutely. So Hidden Gorg, what we're trying to do is create the ability to take any work of fiction, whether it's a movie, a TV show, a novel that you fall in love with, and to let you play roleplay in that world. And so what we're trying to create as a product experience is that ability that you fall in love with something, and then we take all of the joys you might find from like a tabletop RPG experience and give you that ability to roleplay immediately in this sort of intersection of like fanfic and RPG energy in that world. And for that, we have an AI narrator who's really in like tabletop parlance like our dungeon

Hilary Mason: master.

Hilary Mason: And so we're really trying to create an experience that a lot of people are creating for themselves in different ways already. So obviously like tabletop is amazing. Like I've been playing since I was a kid. It's been really influential for me. There are also lots of communities that folks who write fanfic or who find other ways to sort of roleplay or imagine their own stories in those worlds. And so what we're doing with Hidden Door is using technology to make this experience something that has almost no friction to access. So if you are a long time tabletop player, you have to like have your friends, get them all together. Somebody has to do the work to like plan the adventure. Somebody has to learn the rules well enough. If not everybody, like there's a lot of work involved in that play. We're trying to remove all that work. We have very accessible experience that still brings you a lot of those same collaborative, co-creative joys. And our players don't need to care at all that there's like an AI system behind the scenes. It's just the thing that allows our product to exist at all in this moment in time.

Jon Radoff: I really want to drill into the technology a little bit more, but let's spend a little bit more time on the games and doing my business. So you talk about the ability to overcome some of these scheduling kind of problems with getting your engines and driving screws. What's the AI going to bring to the experience that is unique and new for people to now experience a kind of game that they haven't done with them?

Hilary Mason: I feel like I need to introduce this by saying we have a few principles of how we use AI that I think are important to say out loud. One of them is that I do not believe machine learning systems are themselves creative. So they are really great at understanding sort of the world's knowledge and representing it to us, showing us spectrums of possibility, predicting what is likely to happen. But they are not creative in the way that I might be. And when you think you are, or anyone listening is, right? And when you think about what makes the joy of playing a game with your friends, it's not that the story is like the world's greatest novel story, right? Like the story of your game is usually, frankly, something only you and your friends care

Hilary Mason: about.

Hilary Mason: It's that creative improvisational energy that you have together that's funny, where you're also bouncing against the rules and sort of the laws of physics of the world you're

Hilary Mason: playing in.

Hilary Mason: And your GM is like a partner in sort of pushing back on what you're doing, setting up the challenges and like collaborating with you in that story to go forward. And so the role we see here for the technology is sort of setting out some of those rails and enforcing it and sort of routing stories back around and surprising you. It is creating the space in which you play. So creating the world, writing it out, we generate text and art dynamically together, sort of like a graphic novel or like a live web comic as you play, giving you that ability to do anything, have the world push back and respond to you in a way that makes sense, and then progress the story forward in a way that feels fun. And so I'm going to say that our principles are such that the system exists as a facilitator and sort of a tool in this process, but really it is the players together sort of bouncing

Hilary Mason: off each other and bouncing off the system that create that sort of fun thing in those

Hilary Mason: memories together that you care about or that like, yeah, I don't know. What your play style is, but you might see that like, maybe I see that like, oh, there's some sort of like bad guy here and like, you're going to, you're up to attack it and I could either decide like, oh, I'm going to help you. Like or maybe I'm going to like start like singing love poetry and distract it. And so what the system does is sort of take those, those actions, those intentions and integrate them into a whole in a way that ends up colliding with that sort of, you know, delightful surprise.

Jon Radoff: But I'm hearing part of the game system as though is there are also constraints. There's sort of rails to keep it fun. Like games are essentially systems of constraints, right? Yeah. So I think a lot of people have probably tried at this point, like go to chat GPT and go to role playing scenarios and it kind of reacts to you and will do a lot of interesting things. Yeah. But I wouldn't call it a game. I'd call it like, so how do you bring the structure? That sounds alright.

Hilary Mason: Well, first I have to say I'm very lucky to work with a very good game director, Chris Foster, who has a lot of experience and I've learned a ton from him. So I'm going to try and channel him in answering this, which is that as you say, it is a game. It is not a writing tool. It is not an improvving tool. It is not one in which you as the player get to decide like, oh, I didn't like that die role. I'm going to change what happens. There are a few other tools out there using similar technologies to create those experiences. Tools like SudaWrite for professional writers. They're great, but they're not games. We're building a game, which means you can lose. Like you have to try things and you have to fail at them. And one of the design challenges we thought about is really, what is that structure?

Hilary Mason: How do we embrace the seemingly contradictory problem of you can do anything with a world that pushes back, but not always.

Hilary Mason: So sometimes you can direct the story and other times you can.

Hilary Mason: And for that, we look to what a really good narrator, really good jam would do.

Hilary Mason: We all sort of, when we play these games, we have these social storytelling conventions. We adhere to that are really helpful for us too. You don't split the party. You understand that if you're trying to do something, this sort of derail the overall narrative

Hilary Mason: RQ or GM is building for you, the GM's going to push you right back.

Hilary Mason: We've built similar things into the narrative system to build on that kind of expectation of experience of how the story is going to go. Even if you decide that the way you're going to play is to be as provocative and all you're going to do is put the poop emoji in over and over again and let the system deal with that, which is totally fine. It'll route you right back around with some of those constraints.

Jon Radoff: Don't split the party. I don't know if you deal with this or not, but the griefing player who's going to be backstabbing the party. There's always the player in the experience of tabletop RPGs that does that. Have you thought about those social elements as well?

Hilary Mason: We do. At this point in our development, we basically assume you're playing with your actual friends and you have a way to yell at them beyond what's in our system and you can ultimately, of course, kick people out of your game if they are really, really bullying or griefing. But otherwise, yes, the system will do its best to be like, one thing when our team was playing once where there was a dramatic climactic battle and then we had one guy who's just sitting there eating a bowl of spaghetti. And so, of course, the frame that gets generated is two characters with their swords out, ready to go. And then the one guy over here eating his bowls of spaghetti in the corner because he was

Hilary Mason: trolling.

Hilary Mason: And that's fine. That just becomes funny when you bring it all together. I would say this is a design problem.

Jon Radoff: So from design problems to the technology problems, what were the technology problems?

Hilary Mason: We started this work founded the company three years ago. I've been working with LLM's and text models for quite a bit longer than that. And so, we started very much within approach of controllability, which is, as you say, if you do go to chat GPT or any of these models and you sort of try to play along with it and say, like, okay, you're the dungeon master and we're playing this game and what happens. It kind of works, but it also kind of doesn't forget things and it introduces things and then things sort of go on the rails off the rails. Like, they don't really follow the arc of the story in the same way you want the action

Hilary Mason: paste and all that stuff.

Hilary Mason: So we started very much at the beginning thinking a lot about controllability. And if I bring you back to the premise, in our case of our business model, which is working with authors and IP folders to be able to create games out of their worlds, controllability becomes very important. Because if you're like an author of a novel and you're going to trust us to allow people to play their own stories in your world, there are things you care about. Like, your characters must behave in the way that you want them to. You want to fix points of what might happen in space and let the system sort of generate into that. But not make up its own major changes. And so we started from that position of controllability and essentially have built at the technical level something that is a game engine. Like there is an actual database with every character item location. It has stats. Like it has a character sheet. Those stats change over time depending on the actions logically. And that is something that even for the things that get generated along the way.

Hilary Mason: So maybe you say something like, oh, I pick somebody's pocket and you succeed at that.

Hilary Mason: And then it's going to generate an item that you would have pulled out of that pocket. That item would have been like it's a row in a database now.

Hilary Mason: It exists.

Hilary Mason: And I will say also that alongside that controllability, we can also think about safety. So we can do things like manage a lot of the biases and other problematic content that can otherwise come out of large language models. Not perfectly of course, but we have layers of approaches to reduce the impact of those

Hilary Mason: risks.

Hilary Mason: We can also, because we don't allow, we allow plain text entry, but it gets interpreted through our system, we can make sure that people are, let's say you put the word not seen and it will interpret it as not show. So like, you know, it will come back with some pretty funny stuff, but it will not allow you to inject that into the game.

Hilary Mason: And that's a decision we've made.

Hilary Mason: And then on the last bit, we have this principle we call sketching and pencil and then drawing an ink, which is our system will perhaps imply or hint at aspects of the world, but until players interact with them, they can change to propel the story forward. So you might have something like, oh, you know, I pick up a piece of fruit and it'll be like, oh, you pick up an apple. There's an apple in your inventory. And then maybe, and this is a bad example, because I'm not a good, you know, off the top of my head game narrator. Maybe, you know, some wizard appears and it was like, hey, I need a green thing to let you pass. Well, you have an apple. We don't know what color it is, but we can set that color if you're like, oh, I look for a green thing in my, you know, in my backpack. And be like, oh, you have an apple. There's a probability that's green. Like, let's make it green. And now it'll be green forever. We've like set it in an ink. You've looked at it. It's set in the database. Which is an actual Postgres database. Like, there's no NFT bullshit or anything like that.

Hilary Mason: And like, now you can play with that.

Hilary Mason: And so that's another aspect of our game engine. I think it's really what's cool about it is being able to use these data structures, which are very common. And in our case, even very simple, alongside language together as something that we can then operate on.

Jon Radoff: The idea of being able to put this in the hands of IP holders or work with them to create their worlds is really interesting to me because I've worked with some pretty big IP myself in the past. So I remember pitching George R. Martin on the Game of Thrones game we built. And it was at the dawn of social games. And I told him, you know, this is actually an anti-social game because everyone talks about it. And Game of Thrones. And when I first demoed the game to him, he was like, you know, people aren't dying enough. And we had to go and increase the death level. And then that was sort of very opposite to like Star Trek when I worked with that where there's violence from time to time in Star Trek, but it's actually not what the universe is really about. It's about optimism and exploration and a lot of interesting problems solving engineering. And we really had to work hard to bring those elements back into the narrative so that it didn't always just devolve into like, phaser battle. So it seems like it could be a really challenging thing to get the language model to kind of surface those principal themes of the world.

Hilary Mason: Well, we think, like, yes, absolutely. And also those are amazing stories. I love to play in those worlds. But we think that this is like part of bringing a world in is being able to have a way to define sort of the nature of it. And we use a bunch of shortcuts and tools for that because the goal is to make this a fairly short process. But that means that when you come in and you say, like, okay, here's a new world, you set a mixture right now we use sub genres, which are essentially like clusters of stories we built models on. But you might be like, okay, this world is like 30% comedy, 50% high fantasy with a little bit of modern drama or like Regency romance in there. And what that does for us is give us a starting place for these laws of physics, like how much murder when there is murder like seriously, like how much and it's not just like, you know, action oriented versus not. But like, what are the narrative arcs that our system will propose is that more of a hero's journey sort of thing is that more relationship based and is there like more of a dramatic relationship based narrative that should be the core of the kinds of stories that come out here.

Hilary Mason: When characters die or like, how do we take something like on the database side, like

Hilary Mason: your character has like slapped me and I have lost one energy point, which is the stat we use on the back end. When that gets expressed in language and art, is this like Regency style like, you know, so and so like takes a deep breath and like slaps him across the face or is this like a, you know, you know, so and so like takes their like boxers gloves and like goes, goes

Hilary Mason: that right.

Hilary Mason: There's a lot that the same, let's say the same game engine expression can be expressed in language in any like number of different ways. And so for us, then it's setting the the weights on how we're going to, what is this world feel like linguistically and visually?

Hilary Mason: How are we going to express it?

Hilary Mason: And we use again, this design metaphor of like infinite possibilities narrowed down to like a few game engine changes like to the game state and then expanded out again to infinite expression possibilities sort of create this illusion of the space. And we think a lot about like, like if you're in the Star Wars broader universe somewhere, somebody has to say the force like every scene or two. And otherwise like the story can actually be kind of anything at this point. Like any sort of narrative arc and you like, it could be about a romance, it could be a mystery, it could be a high straight, it could be like a sort of straight forward, like

Hilary Mason: there's a bad dial, let's get in.

Hilary Mason: And so it's also just stealing out like what is unique to this world that makes it feel

Hilary Mason: like this world?

Hilary Mason: And how does that work here? And there's a whole bunch of stuff around the language that gets expressed, the kinds of actions that even become possible and how they happen, the kinds of people I was going to say, but like NPCs you're going to meet, they get generated. Like are they human? Are they alien? If you know, there's a whole language around that. If we're in a sci-fi like is there, you know, faster than the light travel? Like which of these tropes do you get for free in this world? And like what is unique about this world too? Like can we distill that out? What is the vocabulary that's unique about this world? Which you do have to, we can extract from text, but then you have to sort of give it a thumbs up and be like, we want this word used in this context. So yeah, there's actually, you're right on it, like that is one of the core challenges, but it's also the opportunity to build this has become possible because of the technical

Hilary Mason: sort of step function forward, because we're not just building one game for one world,

Hilary Mason: but rather a story engine that can accommodate many. And I would say the secret to that is that stories themselves build on tropes and universes build on those things. And so we're able to model those tropes and then give you the tools you need to say like, no, but mine is different in this way. And I really care about this. And when someone dies and this is something an author actually sent to me, like in my world, they bleed out their eyes, like how are you going to do that?

Jon Radoff: When you mentioned the percentage of comedy, I couldn't stop thinking about that robot from interstellar or they could calculate like what percentage of onion honesty and things like that it would have.

Hilary Mason: Yes. It's an old trope, right? Like it goes back to Douglas Adams and like Hitchhiker's Guide in Marvin the Depressed Robot. And like, yeah, we have a rich tradition of that.

Jon Radoff: So let's talk about the technology itself a little bit more that enables this. That has changed over the last few years that has enabled the kind of games you're making

Hilary Mason: it hidden or set to give you a little bit of history. And this is going to be sort of a personal, personal journey into this. Going way, way back as I said, like I've been at DM, like I played tabletop games for a long time was an English study English and CS and undergrad. So like long interest in writing and world building and all that stuff. But in 2014, I founded a different company called Fast Builder Gloves and we were in applied machine learning research and prototyping product building company. So we had, it was like a halfway house for like misfit academics. We did our own research and we also partnered with our clients to help them build stuff. We published a report in 2014 on natural language generation, not using deep learning, but we were still able at that time, we built a prototype where we, you crowd like 60,000 real estate ads in the New York City area, so we're in base. And then you were able to set the structure data of an ad, like a 14 bedroom, one bathroom apartment with a, like laundry. And it would generate the text for you. So it would do like, oh, this sun filled, you know, cozy space will be your new home. And so I've had a long interest in this technical capability and have worked with it. And indeed at Fastboard, we went on to do a lot of research into extractive summarization, abstractive summarization always with an eye towards how we would build products with it. And we did indeed over the years build products with it with partners. So with we did some in banking, we did some in telecom applications that range from customer service to helping very proficient traders understand emerging news that was relevant to their portfolios.

Hilary Mason: They could more quickly perhaps make a decision about updating their trading strategies.

Hilary Mason: You will see throughout this this principle that again, like we're not trying to replace people, but rather trying to use this information modeling to help people make better decisions, you know, as they're doing it. So like this has been a core approach for me. And in building a lot of that stuff, you know, largely with Fortune 500s and such, I realized that actually the weakness of language models is the hallucination and the ability to the inability to understand fact and the lack of symbolic logic. And if you think about it in a creative setting as a tool for expanding creative play, that actually becomes a huge asset. And so that's one of the technical realizations is that the ability to say like, okay, I'm going to give you like a summary of a plot so far, what might happen next? And to be able to say like, what is the most likely thing, the least likely thing, give me the full range of like encoded possibilities. And let me choose or let me, you know, have another algorithm sort of tuning the probability. This is something we think about in our system, by the way, is like how much of what happened should be the obvious thing that should always happen next and how much of what happens needs to be surprising. Because if we only do the former, the system isn't dumb, but it is very boring. And if we only do the latter, the system is dumb because it's just like doing random stuff. And as a person, you're like, oh, the story like makes no sense. I'm not into it, right? So it's like tuning that alongside people's expectations. Anyway, that was some of the core. That was my technical experience building real deployed production systems around this

Hilary Mason: stuff.

Hilary Mason: And thinking about, you know, what is this? Like, essentially, I love living in this space where we have one of these technical capabilities. And we have really yet to invent the products and the business models around what becomes possible or economically feasible now because we have it. And that's where we are.

Hilary Mason: And with Hid and Door, this was essentially the technology finally catching up to the

Hilary Mason: kind of gaming experience I want to see exist in the world and that I think is largely,

Hilary Mason: like it feels inevitable to me that we will have these systems as a way to play.

Hilary Mason: And, you know, we're sort of taking one shot at what that looks like from a technical level because we built, I should say, we built a lot of our own, all ones and our own models. Yeah, this is not GPT. No.

Hilary Mason: Though, I have to say, like, it is, we benchmark against all that stuff.

Hilary Mason: And I am so thrilled. Like as someone who founded a company around sort of open new models of machine learning research in 2014, now to see the incredible community of open research that's bringing up for community created models and, you know, data sets where we actually do have permission to train on the, like, I find it incredibly, like, almost heartwarming. And we build on that too. Like, we've found Team GPT.J. and like, all that stuff. So I have to get credit there.

Jon Radoff: So at the intersection of the technology and the economic feasibility that you brought up, there's been this trend at least at places like open AI towards bigger and bigger models, although we don't actually know anything about

Hilary Mason: what's going on inside GPT-4.

Jon Radoff: I have some thoughts that actually a lot of it came from hyper parameter tuning more so than just adding more and more parameters.

Hilary Mason: But anyway, it's a very, very large model is, you know, my understanding is with your

Jon Radoff: technology, you're not going in that direction of bigger and bigger and bigger. So can you talk about that and also what's the economic impact of these bigger models

Hilary Mason: and the ability for someone like you and your company at Hidden Door to be able to use

Jon Radoff: language models?

Hilary Mason: Yeah. So I'm going to divide that into two questions. So first, you're right. Like, what we do is take more of an ensemble approach where we will use a model for a specific thing. And that is what it does. And it's something, it's a much smaller model, which is purpose built for that thing.

Hilary Mason: And that thing might be something like, you know, what plot point ought to happen next

Hilary Mason: based on, we have, you know, a data set of millions of stories, including lots of open

Hilary Mason: books and all this stuff.

Hilary Mason: Or it might be something like, let's, you know, figure out the NPC that you're going to encounter given this setup. And we make separate calls to all these things. And we also use a, like a systems metaphor that is rather than sort of like unstructured data into a model and unstructured text back out. We do a lot of unstructured text in, we structure it, we sort of have a database again. And then we take that alongside the text and use that as the place we're generating

Hilary Mason: from.

Hilary Mason: And I do think that, like as an old ML person, one of the capabilities we've lost focus on is that actually these models are fantastic at going from structured data or some information we already understand to transforming it in a meaningful way. So in our case is taking basically our game state database or like a change, a delta in that database, like I got hurt. And then in the context of the world I'm in, it is going to express that in language and art for us. But we still have that structured data at all points in the story. So we have controllability, we have memory, we have like an actual game engine, we can do physics simulations if we want. It's a somewhat different approach. The other consideration that I think is equally important is actually one of UX. And in our case, it's our game designers. And in the future, sort of narrative designers and folks ability to manipulate aspects of the system without needing a machine learning engineer. So how does a game designer say like, oh, this story needs like more content of this sort of trope and less of that one, like we need to give them a dial they continue. And that means the system has to be interpretable as much as possible. So what we do instead is this ensemble of methods. And also like again, as an old machine learning person, not everything is deep learning. Like that's super expensive. Like I don't want to light GPUs on the fire. So like we do a lot of pre generation of stuff, do CPU style ranking of stuff. And then try to make it so that at every point in the story, we understand like where did this come from? Why did it happen? What in our game data, like our engine data, like what made this happen versus something else, so that people who are not themselves engineers or machine learning engineers, so like designers can get in there and play with these tools. And their role becomes not like, I'm going to write the bits of dialogue that are going to come out, but rather I'm going to like puppet master the ensemble of systems to like get the experience I want. And that's, you know, something we're doing because for us, like our goal is to create an amazing game experience. It is not to create the world's biggest, you know, fictional language model.

Hilary Mason: We might do that at some point in the future, like as a side effect, but that's not the

Hilary Mason: primary thing. And I also think that like even in the creation of chat GFT over GFT 3, like GFT 3 had been out for two years, chat GFT was primarily a UX improvement over the GFT 3 model, but it set off an incredible amount of creativity in people building on it because suddenly you had a UX where you could interact with it. And that's like the tiniest, like we've gone the other way and tried to build more of like functional UX for building curating and puppet master in these stories. So it just leads to both very different technical design and somewhat different UX design when you think about it in that sense.

Hilary Mason: And also I should say that I believe OpenAI has stated in public like their goal is to

Hilary Mason: create AGI, so like actually intelligent autonomous intelligences. Our goal at Hidden Doors to make an amazing game, we have zero interest in AGI. So that may also lead to some of our different approaches.

Jon Radoff: I'm not sure I know how to define what AGI means. There was this paper that came out recently from Microsoft actually saying that they detected sparks of AGI in chat GFT. I mean, what's your thoughts on this whole subject? Like what is intelligence anyway? You mentioned earlier, you didn't think these models were quote unquote creative. I find that another kind of I use the word all the time, but I find it also problematic.

Hilary Mason: I think we're like to be philosophical for a moment. I think we're at a moment where we're collectively realizing that we don't know what intelligence really is. And that for the long history of AGI, we've had this Turing test that we've held up is

Hilary Mason: like cool.

Hilary Mason: Like once we do that, we've solved AGI. But it turns out actually we've kind of done that and actually we've kind of done it

Hilary Mason: before.

Hilary Mason: Like even before LLMs, like there was a Turing test competition where somebody like fooled the judges by pretending to be frankly an ESL, like English second language speaker and like a kid.

Hilary Mason: And you know, so this says more to me about what we put on like intelligence is a very

Hilary Mason: heavy word that carries a lot and is not very precise. And I find one of the opportunities of this moment is that like we can rethink a lot of like chomp scheme philosophy of language and intelligence and symbolism given the fact that we have a thing that can do language incredibly fluently. And it is, it is an incredible technical achievement and it is going to be deeply impactful. Like I'm a huge optimist for this stuff. I'm also somewhat of a pragmatist and I don't think that the ability to cleverly manipulate languages symbols equates to intelligence. But also I think we have lost any consensus of what intelligence even is. And actually I was reading, I don't know if you know Julian to Gilea, see had this wonderful blog post yesterday sort of poking at this question saying like how come we're not afraid of the old and ring is going to come to life and destroy the world as a parallel like setting it up. And so I'd encourage for us to go read it because you just said it very boldly and beautifully. But it is really interesting to think that like something that I think we all took for granted, which is that we as humans know what intelligence is has been questioned now. And also this thing we took for granted in our fields that the turning test was meaningful

Hilary Mason: is also now in question.

Hilary Mason: And there's a lot of you know pretty viable discussion on all sides. So yeah, I think it's a very exciting moment for philosophers, for computer scientists, for ethicists, for you know everybody. What I do worry about is that the focus something's like AGI and AGI risk is taking attention away from the focus on ways that these models may be used to harm people or may be inflicting

Hilary Mason: economic harms, social harms.

Hilary Mason: Which is not, it's also not a new thing, but it is a thing we are potentially going to see at much broader scale as the practical uses and the economic value become irresistible. And so to give a more concrete way of thinking about this. And also I co-authored a book with DJ Patel and Mike Lucidis on data and ethics some

Hilary Mason: years ago.

Hilary Mason: And this was really thinking about like okay if you're going to deploy automated systems, machine learning models, even any sort of statistical analysis and use it to make a decision. Like these systems have the potential to have bias. And what they do is they scale that bias. So if we think about like humans with bias, like you're still rate limited to like, like when you think about like human DMV employees with bias, like you're fairly rate limited and where that bias is because a person can only have so many interactions in the day. When you scale that in an operationalized and automated system now, though, you have the ability to take that bias and deeply magnify it. And by the way, these models often magnify the bias in the underlying data because of the nature of the mathematics. So I think a lot of the focus on the AGI risk is taking attention away from a lot of the harms we may see there that are frankly like way more real and way more likely. I could rant about this at great lengths. I will stop and like, what do you say to people?

Jon Radoff: Well in adjacent area of safety is also how children are going to interact with these systems because they're going to become pervasive in society. So of course, kids are going to use it just like kids use Google search right now and

Hilary Mason: encounter whatever.

Jon Radoff: Now your games as I understand it, you want children to be able to play your games.

Hilary Mason: Is that we have architected it to be safer kids as young as nine. And that means following the appropriate regulatory frameworks collecting no PII making sure that we have control and consent safety levels appropriately.

Hilary Mason: So yes.

Jon Radoff: Back to sort of this continuum of risk and intelligence and consciousness and all this stuff. But the core of that is this idea of emergent behavior, which I think is also something that game developers, game designers are really familiar with. So for example, you have a lot of games where you build a certain kind of game, but the players discover a kind of game play on top of that, which is emerging. That's especially in very social games, right? And as soon as you have humans interacting with other humans in an environment, you get a whole lot of behaviors that were very unpredictable. It seems like there's this parallel thing happening in the language models where they start with kind of simplistic behaviors and the bigger they get, the more emergent behaviors,

Hilary Mason: the more hallucinatory they get, the feature that you found, not a bug for your use cases.

Jon Radoff: Can you talk a little bit about like emergence as a property of these games and also how do you design with that in mind? Because we're adding a whole new level of potential emergent gameplay by injecting quote unquote, quote unquote, intelligent systems into them.

Hilary Mason: This is a super interesting topic and it actually makes me think about a lot of the work going on like putting LLMs aside and language models aside, like a lot of the interesting work and reinforcement learning.

Hilary Mason: So reinforcement like games have been the primary way we've explored reinforcement learning

Hilary Mason: research now going on like on and out decades at this point, but you might remember like DeepMind writing papers on playing Atari games. And that's because reinforcement learning is a set of techniques where to say it very simplistically, you have at any moment in time you have a finite range of decisions you can make and then at some point you get a score. So like you know if you made good ones or bad ones. And that's the inputs you need to that system. And I know also there's a lot of energy in the AI gaming startup community around doing things like using reinforcement learning for like natural NPC behavior because frankly, like if you took something like GPT-3 or GPT-4 off the shelf, like that's a language model mostly trained on modern American internet plus like as much international stuff as we could grow in there plus some books, right? And it's like what is that going to do to like being in like say a game engine environment? And yes, it's like this symbolic or the ability it has to sort of like say things that make sense and can be interpreted as like like a left or I go right or like that actually is tremendously powerful and useful and people are starting to sort of plug it in. But it's still not like it's not a model trained of that game environment, right? And so I do think and yet reinforcement learning is a technique is also one of the things

Hilary Mason: allowing these models to progress to the state of capability they're at.

Hilary Mason: So I know this is off the top of my head and very high level, but I think there's something

Hilary Mason: really interesting to play with in thinking about as you say the emergent properties

Hilary Mason: of sort of game design as complex systems, which by the way I think game designers frankly have a particular expertise that is under appreciated in the broader AI world and like

Hilary Mason: somebody should write that paper.

Hilary Mason: It's not me, maybe it's you. Maybe it's someone listening. But that said like I don't think we can predict it other than to say that like certainly interesting stuff will happen. And yeah, like maybe it's like and as we think about it, we need to be very mindful of like where these models come from.

Hilary Mason: And so I'll give you like a like one tip because I do a lot of technical due diligence

Hilary Mason: on AI products of all kinds.

Hilary Mason: And one of the things I do, especially if I'm looking at something like like a NPC chat

Hilary Mason: application, right? Or like like there are several companies out there who will like make characters you can talk to in a game like as an SDK or just as an app yourself for various purposes. Like I've looked at ones everything from like mental health care all the way to like, you know, sexy times all the way over to like, you know, filling in traditional like game

Hilary Mason: modes.

Hilary Mason: I will always try to come up with like, okay, what's this something I can ask this that the things should not know about in the fictional context it exists in. So like ask it, you know, who won the World Series in 1988? But it know about that in its world. And if it does, like it's probably just plugged into GPT three. And therefore, you know, it's not, it's only going to provide one particular kind of experience in that context. If that makes sense. I'm not sure if I'm saying this clearly. So it's looking at ways frankly to poke the model or another example. I was talking to a friend of mine who is using chat GPT to analyze music. So she's, you know, a brilliant technologist and she's like sort of an amateur musician. And she was like, yeah, I keep asking it for facts about a song and it gives me different answers. And I was like, cool, this is our opportunity to like poke the thing. Let's lie to it. Let's say like give it a song with no melody at all and be like describe the melody and see what comes out. Like let's try and understand the boundaries of what these models can actually provide. As we introduce them into our, our like fairly complex messy systems where, as you said, like we already can't predict what humans are going to do. So now we have this additional chaos agent. This might have been a trick question.

Hilary Mason: Do you have an answer?

Jon Radoff: I don't think there's an answer yet. But I think emergent properties are one of the super interesting aspects of games, like especially like massively multiplayer online games. If you've looked at things like Eve online or World of Warcraft and all these things that players end up doing in terms of their social structures and social systems and their own versions of the way they play the game that come out of the underlying system, that's stuff's really interesting to look at. As you mentioned this thing about poking systems about stuff it shouldn't know about. I was strangely remembering a recent interview with Sam Altman actually. He was like, well, the way you'd know if a system is conscious is you'd make sure you trained it on a body of knowledge that completely excluded that so it wouldn't know anything about consciousness. If it started expressing a subjective experience like consciousness, maybe it is. But anyway, that's kind of science fiction, but it's cool to think about.

Hilary Mason: And I think there was another point you were making earlier around how people learned to interact with these models. Frankly, like what I'm proposing is to gaslight it and I see what it does. And I would never do that to a human.

Hilary Mason: And so I'm thinking myself about like, what are my ethical boundaries?

Jon Radoff: And like, sounds like the Dan hack that people are doing to GPT to get it to talk about stuff that it's supposed to be trained out of speaking of reinforcement learning. So reinforcement learning is like we could have just spoken for probably an hour about reinforcement learning, but really interesting area that is part of what actually improved the user experience. When we think about the user experience of GPT or chat GPT, a big part of it was the reinforcement learning they applied against it.

Hilary Mason: And when you talk about games in the Atari research that you were referring to, I'm also

Jon Radoff: thinking about the research around poker and then diplomacy recently. So diplomacy is really interesting because it actually had to use language to negotiate with other players. And that seems like a whole brutal area where you want to constrain the kinds of language that it's going to use to something that's relevant to the game, but it also has to be expansive enough to also be able to kind of act like a human word in that context.

Hilary Mason: Right. And it's playing, you're kind of playing two games, or at least as a human who has played diplomacy, you have the game and then you have the social game you're playing. On top of the game where the game is a scaffold really for that social game. And so it is really interesting to think about, let's say automated systems in that context as well and learning to play those games at multiple levels at the same time the way a person would. And I also wonder, like again, having played diplomacy, like, do I have to play that with friends that are so good that like you'll still be friends after you stabbed each other in the back, or with people you don't care about anymore. So like, you know, I wonder where the system will fall on how like, like, are we going

Jon Radoff: to be closer because of this diplomacy is one of those games that has the reputation of like a good game to play if you don't want to be friends with someone. Right. Play more afterwards. Hillary, this has been an awesome discussion. I hope it really inspires people who are thinking about games to utilize some of the AI technologies out there to build really creative products. But before we end, I've been running an experiment.

Hilary Mason: OK.

Hilary Mason: So I have an envelope here and I asked Chatchy PT before we got started and knowing that we would be talking about language models and stuff anyway, I said, Chatchy PT, what would

Jon Radoff: Hillary and John talk about in the conversation, like a fireside chat? I just glanced at it to make sure that it gave a response, but I'm going to open up and we're going to see how it did. Because as far as I know, this experiment has never been run.

Hilary Mason: All right.

Jon Radoff: Let's see.

Hilary Mason: It's very dramatic and exciting.

Jon Radoff: Let's see what Chatchy PT said. Let's see if we missed anything that we should have talked about. So it said we should talk about the future of AI and its potential impact on society, the role of data science and machine learning in developing a personalized user experience. I'm not going to list all the things it did because it had too many here. There were strategies for building and scaling online communities. We didn't quite get to that. That was sort of touch the emergent stuff. The ethics of data collection and usage and how to balance the benefits with the potential risks and harms. So it knew that was a topic that you carried out.

Hilary Mason: Well, thank you, Chatchy PT. I do.

Jon Radoff: So interesting. So it did a decent job of kind of intersecting some of the areas that we're interested in and coming up with it. Well, I'll post the actual response it did in case anyone is super curious about this. And maybe I'll run more experiments.

Hilary Mason: It's just fun.

Jon Radoff: Yeah, Hillary, thank you so much for being part of this conversation.

Hilary Mason: Thank you. This was great.