Nataliya Kosmyna
👤 PersonPodcast Appearances
Thanks for having me.
Excited to be here with you.
Some people do want you to believe that.
You know, it's like science has 99 problems and virality ain't one, right?
Yeah, so we run a study that's exactly the title, right?
Your Brain on Chat GPT, Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Tasks.
So we did a very specific task that we're going to be talking right now about, which is essay writing.
We invited 50 students from greater Boston area here to come in person to the lab, and we effectively put those headsets you just mentioned on their heads to measure their brain activity when they're writing an essay.
And we divided them in three groups.
We asked one group, as you might already guess where that's heading, to just use ChatGPT.
That's why paper is called Your Brain on ChatGPT.
It's not because we are really, really singling out ChatGPT.
It's just because we use ChatGPT in the paper, so it's purely scientific.
So we asked one group of students to use only ChatGPT to write those essays, another group to use Google, the search engine, to write those essays, and the third group to use their brain only.
So no tools were allowed.
And we give them topics which are what we consider high level, right?
For example, what is happiness?
Is there a perfect society?
Should you think before you talk?
And we give them a very limited time, like 20 minutes to write those essays.
And we finally, of course, looked into the outputs of those essays, right?
So what they actually written, how they use ChatGPT, how they use Google.
And of course, we asked them a couple of questions like, can they give a quote?
Can they tell us why they wrote this essay and what they wrote about?
And then there was one more final fourth session in this study where we swapped the groups.
So students who were originally in child GPT group, we actually took away the access for this fourth session and vice versa was true.
So if you were, for example, you were not our participant, but if you were ever to come to Cambridge and be our participant, and let's say if you were actually... I'm not putting anything on my head.
Come on, it's the future.
It's the future.
I see that happening regardless.
So if you were, for example, in our participant in brain-only group, actually for this first session would give you access to chat GPT.
And again, we measured exact same things, brain activity, what actually was an output and ask couple questions.
And what we found are actually significant differences between those three groups.
So first of all, if you talk about the brain, right, we measured what is called brain functional connectivity.
So let's, in a layperson terms, like I'm here having three of you talking to each other, talking to myself.
So that's what we measured.
Who is talking to who?
Am I talking to Neil or is Neil talking to you?
So directionality, so who talks to who in the brain, and then how much talking is happening.
Is it just, hi, hello, my name is Natalia, or actually a lot of talking, so a lot of flow of data is being exchanged.
So that's literally what we actually measured.
And we found significant difference.
And then some of those are ultimately not surprising.
You can think logically.
If you do not have any, let's say, you need to do this episode right now, right?
And I'm going to take away all your notes right now, all of the external help.
And then I'm going to measure your brain activity.
How do you think it's going to turn out?
You're going to have like really your brain on fire, so to say, because you need like, okay, what was her name again?
Where was the study?
What is happening, right?
You need to really push through,
With your brain, like, you have memory activation, you need to have some structure, like, and now you don't have nodes for the structure of this episode, right?
So you need, like, what was the structure?
What we did this set?
What we are talking about?
What is, you know, you really have nothing to fall onto.
So, of course, you have this functional connectivity that is significantly higher for brain-only group compared to the two other groups, right?
Then we take search engine group, Google.
And actually, just as a prior research, there's a ton of people about Google already.
We actually, as a humanity, right, we are excellent in creating different tools and then measuring the impact of those tools on our brain.
So there's quite a few papers we are citing in our paper.
For example, there is a paper, spoiler alert, called Your Brain on Google from 2008.
Literally, that's the name of the paper.
So we've actually found something very similar to what they found.
There would be a lot of activations in the back of your head.
This is called visual cortex or occipital cortex.
It's basically a lot of visual information processing.
So right now, for example, someone who's listening to us and maybe they are doing some work in parallel, they would maybe have some different tabs open, right?
They would have like one is like YouTube tab.
And others, I would have like some other things that they're doing.
So, you know, you're basically jumping between the tabs, looking at some information, maybe looking at the paper while listening to us.
So this is what we're actually seeing.
And there's plenty of papers already showing the same effect.
But then for the LLM group, for ChartGPT group, we saw the least of these functional connectivity activations.
And that doesn't, again, mean that you became dumb.
Yes, it does.
There's actually quite a few papers specifically having the title laziness, and we can talk about this with other results.
But from brain perspective, from our results, it doesn't show that.
What it actually shows is that, hey, you have been really exposed to one very limited tool, right?
You know, there's not a lot of visual stuff happening.
Brain doesn't really struggle when you actually use this tool.
So you have much less of this functional connectivity.
So that's what we found.
But what is, I think, interesting and effective may be heading back to this point of laziness.
And some of these, maybe a bit more, I would say, nefarious results are, of course, other results that are relevant to the outputs, to the answers themselves.
So first of all, what we found that the S's were very homogenous.
So the vocabulary that was used was very, very similar for the LLM group.
It was not the case for the search engine and for the brain-only group.
I'm going to give you an example.
And of course, in the paper, we have multiple examples.
I'm going to give you only one.
Topic, happiness.
So we have LLM, so ChatGPT users, mentioning heavily the words career and career choice.
And surprise, surprise, these are students.
I literally just mentioned this.
Of course, they're going to more likely talk about career and career choices.
And again, who are we ultimately to judge what makes a person happy, right?
No, of course.
But don't forget, the two other groups, they are from the same category.
They are students in the same geographic area, right?
However, for them, these words were completely different.
For the Google, for the search engine, students actually heavily used vocabulary giving and giving us.
And then brain-only group was using vocabulary related to happiness and true happiness.
And this is just one of the examples.
And then finally, to highlight one more result, is responses from the participants themselves, from those students.
So we asked literally 60 seconds after they gave us their essays.
Can you give us a quote?
Any quote, any length of the quote of what you had just written can be short, long, anywhere in your essay, anything.
83% of participants from LLM, from ChargeGPT group could have not quoted anything.
That was not the case for brain and search engine groups.
Of course, in sessions two and three and four, they improved because, surprise, surprise, they knew what the questions would be.
But the trend remained the same.
It was harder for them to quote.
But I think the most ultimately dangerous result, if I can use this term, though it's not really scientific, but something that I think a lot of inquiry actually is required to really look further into this, it's almost on philosophical, I guess, level, is ownership questions.
So we did ask them if...
how much percentage of ownership do they feel towards those essays?
And 15% of chat GPT users told us that they do not feel any ownership.
And of course, a lot of people, especially online, mentioned, well, they haven't written this essay.
Of course, they didn't feel any ownership.
But I think that's where it actually gets really tricky.
Because if you do not feel that it's yours, but you just worked on it,
Does this mean that you do not care?
We do not obviously push it that far in the paper, but I think this is something that definitely might require much further investigation.
Because if you don't care, you don't remember the output, you don't care about the output, then what ultimately is it for?
Why are we even here, right?
Of course, it's not all dark gloom and everything is awful and disastrous.
I mentioned that there's this fourth session.
Not everyone came back for this session.
So actually, sample size is even smaller for this.
Only 18 participants came back.
But what we found is that those who were ChatGPT users originally and then lost access to ChatGPT
their brain connectivity was significantly lower than that of the brain-only group.
However, those who were originally brain-only group and then gained access to CHR-GPT, their brain connectivity was significantly higher than that of the brain-only group.
What it could potentially, and I'm saying potentially because, again, much more studies would be required, means that timing might be essential.
Basically, if you make your brain work,
Well, and then you gained access to the tools.
That could be beneficial.
But of course, it doesn't mean that it's one second of work of the brain and then you use the tool, right?
Something like, let's say, you're in a school and maybe first semester you learn your base of whatever subject it is without any tools, like old school way.
And then on the second semester, you didn't become an expert, right, in one semester of a school year, but you at least have some base, right?
And then, let's say, in the second semester, you gained access to the tool, right?
So it might prove actually beneficial.
But again, all of this is to be still shown and proven.
We literally have very few data points.
But the tool is now being really pushed on us everywhere.
In our case, we're measuring electrical activity, so there's multiple ways of measuring things.
Yeah, electroencephalography, yes.
Yeah, it's a fair, it's a bit simplified, but kind of fair way.
And it doesn't mean that it's very important.
It's not that that part doesn't work, right?
Or like it atrophied itself like we saw in some... No, no, no.
It doesn't go down.
Like, listen, I'm going to give you one example, right?
It's like back to this crazy example of 3% of our brain versus 100%.
Like, if you were to use not 100% of your brain, like literally, we would not have this kind of section right now at all.
So it's very important to understand we use our brain as a whole.
Of course, you have... Of course.
But back to the energy, and I think this is like super important, it still takes much less energy than even, you know, 10 requests from ChatGPT or from Google.
And this is beautiful because our body, right, so imperfect as a lot of people call it in our brain, so imperfect, which it is very often.
Old, ancient, as some people say, computer still is the most efficient of machines that we all have, right?
And we should not forget that.
People and all of the AI labs right now around the world try to mimic the brand.
They try to pull so hard all of those preprints that you've seen and archives the service that hosts those papers.
how can it be similar?
Can we ensure that this is similar, right?
And so there is something to it because we are actually very efficient, but we are efficient almost to the limit of the shortcuts that actually makes in a lot of cases a bit too efficient, right?
Think about like, hey, you really want to look for these shortcuts to make things the easiest.
The whole goal of your brain is to keep you
alive, not to use charge EPT or LLM, not to do anything.
The only ultimate goal, let's keep this body alive.
And then everything else adds on, right?
And so this is how we are running around here.
We're trying to obviously then figure out how we can make life of this body.
as easy as we can.
So, of course, these shortcuts are now, as you can see, used in a lot of social media, which obviously heavily talked about, and we know about some of those dark patterns, as they are known, are heavily used, and some of them are designed by neuroscientists, unfortunately, because it feeds back into the needs of the brain.
Constant affirmation, fear of missing out, all of those are our original, original design by the nature, right?
And of course, now we can see that LLMs would be and are getting designed by those as well.
So definitely, that's an excellent point, right?
So theoretically, to answer your question, we can, right?
The difficulty actually would be on the LLM part, not on our part, because we do not have, you know, there's a lot of these reports, right?
In their LLM consumption per all of these tokens for the prompts, right?
But what a lot of companies, well, actually no, almost no companies are releasing is what it took for training, right?
So for you, it took 30 seconds of thinking, right?
And I hate, hate, hate this word thinking when we use it for LLMs, right?
That's not thinking, right?
But let's keep it for now.
Thinking, that's what you see on the screen.
But ultimately, you do not know, neither you nor myself, there is no public information how long it took for you to be trained to actually give you some pottery.
Most likely, my assumption, this is obviously subjective.
I do not have data, so I need to be very clear here.
But my estimate from overall knowledge that is available, you're going for a week to the library, not going to be more beneficial for your brain because you will talk to other people, get in this chat of the library.
and all of the process information, your brain will struggle.
Your brain actually does need struggle.
Even if you don't like it, it actually needs it.
You will learn some random cool things in parallel, maybe excluding pottery, and that will still take less for your whole body to work, right, than actually that 30 seconds of the pottery from a chat GPT.
Again, very important here as a note, we do not have the data from LLM perspective, so this is just my subjective one.
Absolutely.
So just to simplify, right?
So what actually happens is for the different types of cognitive load, right?
Actually, in the paper, we have a whole small section of this.
So if someone actually wants to dive into that, that would be great.
There are different types of cognitive load.
And the whole idea is that it's how much of the effort you would need to be on the task or to process information in the current task.
For example, if I'm going to stop right now talking as I'm talking, I'm going to start just giving you very heavy definitions.
Even if you're definitely interested in those, it will be just harder for you to process.
And if I were to put
um this brain sensing device on you right the EEG cap that I mentioned you would definitely see that spike because you would try to follow and then you'll be like oh it's interesting but really gets hard and hard if I'm gonna just throw a ton of terminology on you right so that's basically what and this is just simplification right there's definitely um like check the paper and there's so so much into that
The idea for the cognitive load and the brain, though, is that all it is studied before us, so not in our paper.
We just talk about this, but there are multiple papers, and some of them we cite in our paper, is that your brain actually in learning, specifically in learning, but also in other use cases, but we are talking right now in learning,
actually needs cognitive load.
Like, you cannot just deliver information on this, like, platter.
Like, here you go, here's information.
There are studies already pre-LLM, so pre-large language models, use pre-chartboards that do talk to you about the fact that if you just give information as is, a person will get bored real fast.
And they'll be like, yeah, okay, whatever.
There will be less memory, less recall, less of all of these things.
But if you actually struggle for the information on a specific level, it should not be very, very hard.
So if you are cognitively overloaded, that's also not super good because basically you can give up, right?
There's actually a very beautiful study
From 2011, I believe, it's actually measuring pupil dilation.
So literally how much pupil dilates when you are giving very hard to understand words and vocabulary.
And you literally can see how when the words becoming longer and harder,
Basically, it kind of shuts down.
It's like giving up.
I'm done here processing all of that.
I'm just going to give up, right?
So you don't want to get a student or someone who is learning something new on this give up.
Information is already delivered to you within 30 seconds or 3 seconds or 10 seconds, and you haven't really struggled yet.
There is not a lot of this cognitive load, and a lot of people would be, but that's awesome, right?
That's kind of the promise of these LLMs and a lot of these tools.
But we do not want to make it too simple, right?
We do not want to take away this cognitive load.
It sounds like cognitive load.
Don't we want to take it away?
No, you actually do not want to take it away.
It's actually perfect, right?
It was perfect example right now in this conversation between Chuck and Neil because Neil is like, I have no idea what you just said.
Maybe it's a nonsense.
Maybe it's actual real stuff.
It's perfect.
If you have no idea, so you are basically novice, right?
So you have no base, right?
You can really be like, what is happening?
You will have confusion.
You will have heightened cognitive load, right?
You would be like, have I heard of anything like that before?
So you will try to actually try to do a recall.
Like, okay, I haven't heard it.
It's not my area of expertise.
What is happening here?
And obviously, you will now, because you heard all of these words that you have no idea about, and if the topic is of the interest to you overall, you will try to pay attention, make sense out of it, maybe ask questions, etc.
But if you are effectively trained on it, right, so you're a doctor, you are a teacher, you are an expert in the area, we see that there are significant differences.
Well, first of all, because you obviously know what to expect, so this expectation of
Vocabulary expectation, right?
Some of the conditions, there was expectation when someone is coming to an ER and they are expecting like a doctor who's there.
They saw it all or maybe almost all of it.
So they're actually having a good or rough idea of what they are expecting, right?
So you're kind of comparing this constantly.
The brain just does it.
And of course, it is more comfortable for them, right?
But it's great that you brought doctors, actually, because back to the doctors, there was actually a paper published
a week ago in the Lancet, which is a very prestigious medical journal, actually talking about doctors.
In the UK, yes.
And they apparently, right, pointed out that in four months of using an LLM,
there was actually significant drop in recognition of some of the polyps and some of actual, like, I don't remember, is it polyps?
Something else related to maybe cancer that is on there, also x-rays, right?
And also x-rays.
when you used an LLM.
So it's back to this point, right?
So we are suggesting to use a tool that's supposed to augment your understanding, but then if you are using it, are we taking the skill away from you?
especially in the case of the current doctors that learned it without this tool, right?
And now what will happen for these doctors, for those kids, for those babies that are born right now with the tool, and will decide to become doctors and save lives?
They will be using the tool from the very beginning.
So what are we going to end up having in the ER, in the operating rooms?
That's a great question here.
So it's definitely this drop, right, in skill set for these doctors in that paper, right?
I am actually doing horseback riding professionally, so I'm going to pretend I haven't heard anything in the past few minutes.
But, I mean, you can talk definitely about the skill set and expert level, right, and all of that, and how important actually to include the body and environment, but to your point, right,
Effectively, so first of all, right, there are actually two sides to answer your question.
There is right now no proof that there is anything being freed per se.
People definitely, it's going to free, it's going to, like, what is exactly that being?
Like, we literally have no data.
Can it free something?
Sure, but we don't know what, for how long, is it useful, how we can rewire it?
We don't have any of this information.
So potentially, yes, but hard to say.
But more importantly, right?
Okay, but if you are right now using an LLM, just practically speaking, you're using an LLM to, let's say, write a book, right?
You're writing a book, so you're doing some heavy research, you send it for doing what a deep research or whatever it's called these days, right?
it's each day some new term there.
You are, what exactly are you doing?
You still kind of monitor back the outputs.
It doesn't really release you.
Maybe you went to do something and you think, you think in your head that you fully offloaded that task.
But your brain doesn't work like that.
Your brain cannot just drop it.
Oh, I'm thinking about this and now I'm thinking about that.
Your brain actually takes quite some time to truly release from one task to another task.
Even if you think,
I'm I just put it on like this explain to me how what are the principles of horseback riding and I just went to uh to do this a task like write this report for my manager whatever completely different thing and you think you're good but you're not actually your brain is still processing that so it's not that there will be a gain right but again you do need more data but
Because, of course, as I mentioned in the very beginning, we as humanity, we are excellent in creating tools.
And these tools, as we know, they do actually extend our lifespan very nicely.
But I would argue that they are not actually cognitively the most supporting in most cases.
I think that here we have a lot of open questions.
We have studies about, for example, GPS, right?
Everyone uses GPS.
And multiple papers about GPS, they do specifically show that this dosage, so how much you use GPS, does have a significant effect on your spatial memory and on your understanding of locations, orientation, and picking up landmarks, so buildings around you, literally.
It's like,
Or what is this?
You literally have, you just saw something in the tour guide online, and you will not be able to recognize this actually as a building in front of you right away.
You need to pull the photo as an example.
And there are plenty of papers that actually looked into the tools, right?
Maybe we don't.
We already have one, right?
We have a classic GPS and you have Uber and all these other services.
And the problem, right, it's again, back how they are used because there's also a lot of manipulation that is in these tools, right?
It's not just we are making this drive easier for you.
Somehow when I'm going to a hospital, I'm here to see patients because I don't only...
understand how we use other lamps but i do a lot of other you know projects so when i'm going to that hospital here massachusetts general takes me one hour always one hour in uber if i'm driving it takes exactly 25 minutes somehow right and again the question is why is it that right we're not gonna go in uber right now but again this is back to the idea of the algorithms
and what the algorithms are being actually pushed and what they're optimized for.
And I can tell you, not a lot of them optimized for us or for user or for human first.
You didn't know about Dredge Road.
Yes, you do know about Dredge Road.
So, well, first of all, right, we felt that we, of course, well, we are not experts.
I would not be able to rank those essays right in this topic.
So I felt that the most important is to get experts here who actually understand the task, understand what goes into the task and understand the students and the challenges of the time.
So we actually got the two teachers who had not English teachers before.
Nothing to do with us, never met in person, not in Boston whatsoever, have no idea about the protocols.
The experiment was long done and gone after we recruited and hired them.
And we gave them just a minimum of information.
We told them, here are the essays.
We didn't tell them about different groups or anything of the sorts.
We told them, these folks, no one is majoring in any type of literature or anything that would be relevant to language or journalism or things like that.
They only had 20 minutes.
Please rank, reconcile, tell us how would you do that.
We felt it's very, very important to actually include humans, right?
Because this is the task that they know how to rank, how to do.
But back to AI, right?
Why we thought it's interesting to include AI.
Well, first, of course, a lot of people actively push that AI can do this job very well, right?
That, hey, I'm going to just upload this.
They're really great with all of these language outputs.
They will be able to rank things.
And how you do this, you actually give it a very detailed set of instructions, right?
How would you do that?
And what things to basically you need to carry about, like that these had 20 minutes, right?
So something very similar to teaching instructions, just like more specific language.
We're actually showing the paper exactly how we created this AI judge.
But there were actually differences between the two, right?
So human teachers, when they came back to us, well, first of all, they called those essays, a lot of the essays coming from LLM group soulless.
That's a direct quote.
I actually had, I put a whole long quote in there.
Soulless, I like that.
No, but that's the thing, right?
Teachers, and this is super interesting, because these teachers obviously didn't know these students.
They're, again, not coming from this area whatsoever.
So they actually picked up when it was the same student writing these essays throughout the sessions, right?
For example, Neil, you were a participant, so I'm taking you as an example as a participant here.
So they were like, oh, yeah, this seems like it's the same student.
So they picked up on these micro-linguistic differences.
You know, a teacher knows you.
You can, like, fool around.
They know your work.
They will be able to say, okay, that's yours, and this is copy-pasting from somewhere else or someone else.
And interestingly, they said, did these two students sit next to each other?
We were like, oh, no, no, no.
The setup is like one person in a room at a time.
We didn't even think to give them this information.
We're like, oh, no, no, it is not possible in this use case.
So they literally saw themselves copy-pasted this homogeneity that we found.
They saw it themselves, right?
interesting AI judge definitely was not able to pick up on the similarity between the students, right?
Picking up that, oh, this is, for example, Neil's writing throughout these sessions.
So just to, again, show you how important... Did you just accuse me of having soulless writing?
No, that's the point.
So the idea here, right, that human teachers, right, and their input and their intimate, really, truly intimate understanding, because, again, this is English, so for the specific task, we got the professionals, the experts.
They really knew what to look at, what to look for.
And AI, however good it is with this specific, right, because we know, like, essay writing, a lot of people have been considered experts.
Why would you even take essay writing?
This is such a useless task in 21st century 2025, right?
It still failed in some cases.
This is just to show you that limitations are there.
And some of those you cannot match, even if you think that this is an expert, it is still a generic algorithm that cannot pull this uniqueness.
And what is very important is this for students in the class, in the real classroom, right?
You want this uniqueness to shine through.
a teacher can specifically highlight that, hey, that's a great job here.
That was like a sloppy job here.
That was pretty soulless.
Who did you copy it from, from an LLM?
They even were able to recognize that.
And this level of expertise, it's unmatched.
And in all of that conversation, like segueing a bit on the side way, but all this conversation of PhD level intelligence, I'm like, yeah, sure, just...
You know, hold my glass of wine right here, just here.
I'm French, so I'm just, hold my glass of wine here.
So, you know, it's not that.
And we are very far from truly understanding the human intent because if you write for humans, it needs to be read by humans.
Like our paper, it's written by humans for humans.
And we saw how the lambs and the lamb summarizations failed miserably all the way to even summarize it.
So there's an excellent point.
Yeah, this is the thing, right?
You absolutely can give these instructions, give more soul, give a bit more of personality, all of these things.
But you have a lot of this data contamination, right?
So whatever it's going to output and throw out of you, that's old media.
It has already seen it somewhere.
It's already someone else's, right?
And we need new stuff, right?
So, and I am very open saying this, even like, you know, at institutions like me, any school, whenever I'm teaching something, you need uniqueness, right?
I want Neil's soul there.
I don't care about randomness of those outputs from an algorithm, from all around, of the stolen data, from...
from the planet, right?
I don't care about that if, of course, this is what... But, you know, it's back to what are you scoring?
Are you scoring a human?
Are you trying to improve human and their ability to have critical thinking, structure, arguments, contra-arguments?
Or are you scoring an AI, an algorithm?
You know, AI doesn't need to have this scoring, right?
LM doesn't need that.
Or are you scoring...
human who uses an llm right so this is going back to i guess educational setup and we'll have a lot of questions we will need to find answers to right what are we doing what are we scoring what are we doing it for and for whom and i just think pure human to human right that's what we really need to focus but there will and there is a place for human augmented and llm obviously will be used for augmentation but there are a lot of questions there right
Please don't believe it.
I can give you one better.
I can give you one better.
Surprise, surprise.
Why is that so good, right?
Someone actually sent me yesterday from Grog, right?
Another LLM, interesting LLM, I would say, saying that apparently Natalia Kasmina is not MIT-affiliated scientist.
And I'm like, okay, that's also interesting.
Absolutely.
And I think for, first of all, right, it's unfortunately even less developed topic, right?
It's like, you know, I cannot, like, it's awful topic.
So we're going to get into this, but I cannot, I cannot like not make this awful joke.
Kind of, hey, Siri, I have problems with relationships.
It's Alexa, right?
It's a joke for a very heavy topic, so I need to preface it immediately that we have even less data and less scientific papers, preprints or peer-reviewed papers about this.
So most of what we have right now, we personally received after our paper around 300 emails from husbands and wives telling us that their partners now have multiple agents they're talking to in bed.
And I immediately thought about the South Park episode.
from a couple like years ago like with integrity and like that you know farm as like literally but we have much less of scientific information about this what we have what we know right that also coming from our group's research that there is definitely amplification of loneliness that's what we know as a research and some of other papers are showing up right now
There is potential, and again, a lot of people who are pro-AI therapy pointing out on advantages of the fact that it is cheap.
It's $20 a month compared to hours that can cost up to hundreds of dollars a month, right?
But there is definitely, you know, a lot of drawbacks here.
And the drawbacks is we see that because there is not such a regulated space, it still can basically give you suggestions that are not good.
So you knew that earlier, a couple of months ago, for example, the CharGPT, I'm going to give you an example on CharGPT, because again, we are focused on CharGPT, but the ones are actively, actively publicized, at least.
It actually suggested, you know, different heights of the brand.
bridges in new york if you say that you lost your job right so can not smart enough to do this connection that maybe that's not what you need to give response to and apparently right from this awful recent it's interesting where a teenager 16 16 so so young
unfortunately, you know, suicided.
And now, Chaji Patia, OpenAI, and Sam Altman are being sued.
Apparently, what happened is that a conversation from the spokesperson of OpenAI pointing out that they thought when a person is talking about suicide, not to engage at all, just say, here are the numbers, this is what you need to do, and stop talking.
But they thought that experts told them that, hey, it might be a great idea to try to dig
people a bit out.
But it looks like in this case, it still failed because from the conversations that have been reported, we don't know how authentic they are.
It looks like it's suggested to keep it away from parents.
But my question is why at 16 years old, he was even allowed to use a tool that is so, so, so unsafe.
stable in the responses, really, can hallucinate any time of the day in any direction.
So I think that's where the danger comes from.
And of course, you know, loneliness.
We know that, you know, pandemic of loneliness, you know, this term that was coined in, I believe, 1987 for the first time at a conference, like pandemic of loneliness.
That's the whole business, right?
Because think about it.
If you hook someone up on an LLM at 13 years old because the school, a county decided that they want to use an LLM in the school.
By the age of 18, you have a full-fledged user, right?
A user of an LLM.
And, you know, it's like, you know, again, who calls people users?
Like drug dealers and software developers.
Yeah, but it's true, right?
Well, I mean, we saw some of it, right?
So, well, first of all, right, we saw that we obviously don't know if this is direct response or not, so we're not going to speculate there whatsoever.
But several weeks, just very few, like three, four weeks after actually our paper was released, OpenAI released study mode for CharGP.
And it's I think maybe sometimes it should have been released from the beginning.
I'm just saying.
But, you know, if you have a button that can immediately pull you back in default mode, who's going to use it starting mode right altogether?
Like that.
I don't need to run a study here.
We know some people might, but not everyone.
Because again, back to the brain.
Brain will look for a shortcut.
Shortcut is the responses here.
And I can go do all the other cool stuff.
So who's going to actually use it, right?
We still need studies on that.
That's the first point, right?
Second point, of course, age is important.
Because again, the brains that are being developed right now are potentially the highest rate.
Because here we all were born long before this tech existed.
And a lot of AI developers and people who are running these companies are all folks who, again, were all born long before the tech existed.
So they learned the hard way, how to ask questions, art of the deal, going through all of that.
They know how to ask a question.
What about those...
who actually are just born with the technology, will they even know how to ask a question?
And back to the point, right, of the age, I don't think it's ultimately only for young, of course.
We do need to look for the older, right, for also just younger, I mean, young adults, of course.
Everyone is talking about humanity's last test.
I would call it we are on the verge of humanity's last...
And I'm sorry, I know you might need to blurb this term out.
But what I mean here, obviously, intimate relationships for people, right?
With the promise of this.
Yeah, but again, that's crude, but it's back to this point of designing against these interestingly appealing ladies and gentlemen and whatnot in these short skirts, whatever it is.
Who's going to go make those babies who will pay those taxes?
I'm just saying, right?
And again, very famous expression, no taxation without representation, right?
I do not want my prime minister or secretary of defense
use a random algorithm to make decisions.
I'm paying my taxes for them to think, not for an algorithm to think for them, right?
So there is a lot of these repercussions.
But back to ultimately the point, actually, is anyone taking this seriously, right?
We just need more human-focused work on AI.
Like, I remember when the paper went viral, right?
We didn't even put any
press release, and we literally uploaded it to archives.
This is a service where you call these papers that didn't go through peer reviews yet.
We literally, I didn't post, not a single author.
Yeah, it's a pre-print service, right?
And no one, no one, neither the lab, nor any of the authors posted anything on social media.
We just went about our days.
Two days later, it goes viral, and then I'm going on.
Yeah, obviously, right?
And then people use DLLM to some raw, but that's another story, right?
Like, I'm going on X, and actually, I have an account, but I'm not using it.
A lot of academics switched from X to, like, other platforms that we are using.
But I'm going there, and apparently, I learned that there are people who are called AI influencers.
I didn't know that this was a term.
But apparently, these AI influencers, they post these AI breakthroughs of the week, right?
And I went, our paper, oh my God, made a cut.
It's breakthrough number seven.
And I scrolled through this influencer.
Person has totally fallen, whatever.
I don't know, real bots, whatever.
I'm scrolling and I saw like 20 of these posts for 20 weeks.
All of the posts are about GPU, multi-trillion deal here, multi-billion deal here, more GPUs.
I'm like, what is human here?
Where is human here?
Where are we evaluating the impact of this technology on humans?
Why only our paper made it number seven?
And where are the papers, right?
So that's, I think, something where the focus needs to shift, right?
So if these companies do want to be on the right side of history, right?
Because that's like social media, but on steroids much worse.
You do not talk to a calculator about your feelings, right?
So people who compare it to calculators, they're so, so, so wrong, right?
But, hey, it's going to get much, much worse with profiliation without any validation, any guardrails, right?
So we do need to look into that heavily, right?
I can tell you, we received 4,000 emails from teachers all around the world.
Each single country in the world sent an email.
They are in distress.
They don't know what to do.
And that's where, first of all, all my love goes to them.
This makes a cut.
Please, please, please.
So I'm trying to respond to all of those.
But the challenge is that they do not know, right?
There's not really enough of guidance.
And 10-hour workshop sponsored by a company that pushes this partnership on your school does not make a cut, right?
There is a lot of comments how it's actually not supervised, not tested.
And ultimately, right, do you really need to go with these closed models, right?
We have so much open source, whole world, all this software runs on open source.
Nothing, these LLMs would not exist, nothing would exist without open source.
So why don't we run an open source model, meaning like it's offline, right?
on your computer.
And, spoiler alert, you don't need a fancy GPU from Jensen, right?
You can get an off-the-shelf computer and then run a model local with your students, train it over the weekend, come back on Monday, check with students what happened, learn all the cool pros.
cons, laugh at hallucinations, figure out tons of cool things about it.
Like why do we need to push these partnerships that we don't even know?
Like Alpha School, right?
I don't know if you heard about that one.
Apparently AI first ran school, right?
where teachers are now guides that the forms that they are using.
I just saw literally one hour before our call that several VCs posted about this alpha school.
So cash is flowing there heavily, right?
Yeah, venture capitalists heavily push in alpha school.
But again, in first comments from the general public, do we have a proof that that's better?
What are the advantages?
Because there's not going to be a perfect, wide, pure card.
There will be advantages as with any technology.
No, that's totally correct.
Because, hey, we are with this struggle, right?
Obviously, I'm biased because this is actually my job as a researcher, right?
You're sitting, you know, figuring out those answers to those problems, you know, and trying to figure out what would be the best way to measure to come up with this.
So, of course, and there's so, so much more to that, that we are coming up, humans, right?
We designed LLMs ultimately, right?
So we came up with these tools.
It doesn't mean that the tool is fully to be discarded, but effectively, of course, right?
Why you need an institution, for example, right?
I was literally explaining to one of my students three days ago how to use a 3D printer, right?
Well, LLM is not that yet to explain, right?
You can give instructions, sure, with images and with video, right?
But if you're like, hey, this is an old fella, he has a 3D printer.
Let me tell you how to actually figure it out, right?
This level of, again, of expertise, of knowledge, right?
That's what you're striving, but also it has this human contact, right?
That we are now potentially depriving people from because that's how you have this serendipitous knowledge, right?
And connections like, hey, I just chatted and I'm like, oh,
I never thought to do this because I'm in PCIs, and that person is in astrophysics.
Or we never, oh, I actually can use it.
That's totally not brain, but I can totally go apply and try it, right?
And that's the beauty of it, right?
I guess that's one point.
But again, that's back to the point.
Now I know Kung Fu didn't mean that you learned it, right?
It got uploaded into his brain.
It doesn't mean that he actually learned it, right?
That's the thing, right?
Because in the movie, which is excellent, I watched it 19 times or more.
That's actually how I started my career.
And besides this, I don't want to do anything else.
I want to do this specific scenario, right?
And we are still there.
But that's the beauty.
We do not know actually that just uploading would be enough, right?
We have this like more...
tiny I would say studies right now of like vocabulary and words and things like that where we're trying to improve people's language learning right it's like a very very good example to show and so there are tiny examples but we do not know yet that even if imagine imagine we have this magical interface right that will applaud invisible and non-invisible it doesn't doesn't matter we have it right it's it's ready to go perfect function safe whatever you have it and then you applaud all of it that it actually will work
Did you upload the knowledge, like all of that, blah, blah, blah, from chat GPT-75?
Yeah, sure.
But do you actually use it?
Can you actually use it?
Is it really firing, which I'm simply following?
Yeah, back to the grades, right?
It's an excellent point.
And there is no doubt, no one has any doubt, I think, on the fact that education does need to change.
And it has been long, long overdue, right?
The numbers about, you know, the literacy, reading literacy, math literacy, they are decreasing in all the countries, I believe.
I don't see, I saw like ups there anyway.
It's down, down, down all these reports recently from multiple countries, right?
But it's back to the point I made earlier about the grades or about scoring, right?
Who are we scoring and what are we scoring?
Are we scoring a pure human, so just human, like human brain as is, like Natalia?
Or are we scoring Natalia with an LLM, right?
So I'm using it so we know that.
Or are we scoring just an...
LLM, and then there is Natalia who used it, right?
So this will be, even that was important.
But ultimately, the school, of course, is not about that.
As I mentioned, everything you learn is obsolete knowledge by itself.
But it has this base.
You do need to have the base.
You're not going to be a physicist if you don't have it.
Whatever it feels about, you know, you're not going to be math, you're not going to be a programmer.
Our next paper is actually about wipe coding.
Spoiler alert, not going to work if you don't have the base, right?
But the idea is that back to what we actually maybe should look at really is what the school is great, which is the best thing I actually brought from school is this base, definitely super useful, but also my friends.
people on whom I rely in hard situations with whom we write those grants, with whom we can shout and have fun and cry over funding that is over for a lot of us, right?
All of that stuff, right?
These connections, right?
This is what maybe we should value because we are killing it further and further, right?
And we are just...
keeping people in this silos of being a user, right?
And that's where it only stays.
And this imaginary three and a half friends from Zack, from Zackenberg, right?
That he mentioned, thanks to whom we have three and a half friends.
thanks to him and his social media, right?
So I think that's why we need to really look into what we want truly from society, from schools, and maybe on a larger scale, what are the guardrails, right?
And how we can actually enhance it, right?
In the way that are safe for us to move forward and evolve further, which because of course this will happen.
Yeah, I'm going to give you two examples here real quick.
So first about obviously AIs and LLMs, right?
They were not born overnight, but we see how a lot of governments really struggle still and very reactively react to those instead of being proactive, right?
And the challenge here is that we do not have data to actually not to say that it is good stuff, that we should really implement it everywhere in our backyard.
We don't have this data.
Why we are formal?
And there is nothing yet to formal about to really run with it.
But we can absolutely create the spaces where this is being actively used, for example, for adults, for discovery, to understand it.
Why do we need to push it everywhere is still very unclear.
We just don't have this data.
But then back to the point of guardrails, right?
What we should be doing, obviously,
know show myself plug on the bci work that i'm doing there are multiple ethics pushes right now for the bci technology we can agree it's still pretty novel but it definitely moves forward very fast so i'm having a hope that for this technology for the big next thing right we agree lms are great but it's not the next big thing it's robotics and then we will see bci so for this big next thing
I'm very hopeful that we will be in time to protect our thoughts, literally.
Because think about what will happen, right?
Before the study mode, right, you have censorship mode.
And you know how, like, look at Deep Seek, right?
I'm not going to go far.
So think about a billionaire.
I'm not going to even name his name.
Billionaire who has a social media platform, satellite platform, a neural implant that
you know, startup and AI company.
So he decided two months ago to cleanse history, right, from errors and mistakes.
And tomorrow he will decide to cleanse our thoughts, right?
This is the idea for 1999, right?
No, not really.
And that's where we need to be really, really cautious.
Like, we should definitely look into that use case and not make that happen, right?
And allow people for enough agency because that's the thing, right?
People think, oh, that's great, but there is not a lot of agency.
So this freedom of making a choice, that's already made for you in a lot of cases.
And so that's something that we should definitely protect as much as we can.
Do not force on those kids stuff because they cannot consent and say, no, it's because the school forced it on them and their parents decided that that's a big thing in San Francisco, in the Bay Area that you should use, right?
So don't do that.
It's not just that, right?
We needed to do it on larger scales for all of the, you know, spaces, like workspace.
We didn't talk about this because obviously it's heavily about education, but like workspace, we have multiple papers talking that people are not doing that well in the workspace.
Like, for example, programmers estimate that they gain 20% of their time
They actually lose 19% of their time on the tasks.
So there is so, so much more to it.
We need to do this on larger scale with all the ages, including older adults.
And then, of course, on different, different, different use cases and different cultural backgrounds.
This is in U.S.
And, of course, culture, it's very, very different.
Like I talked to so many teachers already, right, in Brazil, all over the world.
You have this intricacies need to account for it's so, so, so important because otherwise it's going to be all washed Western style, which we already saw happening.
And it is happening.
And a lot of people actually very worried their language will literally disappear in like five to 10 years.
And it's not like LLM magically will save it because it will not.
Thanks for having me.