As I thought
about the following article it seemed clear that there are many situations
where we receive value but are not sure how exactly it was achieved. What, for
example, do we really know of the details of the art of writing a novel or a
symphony?
Here is the
article:
Even the scientists who build AI can’t tell you how it works
“We
built it, we trained it, but we don’t know what it’s doing.”
By Noam Hassenfeld Jul 15,
2023, 7:00am EDT
Artificial
intelligence systems like ChatGPT can do a wide range of impressive things:
they can write passable essays, they can ace the bar exam, they’ve even been
used for scientific research. But ask an AI researcher how it does all this, and they shrug.
“If we open up ChatGPT or a system like it and look inside, you just
see millions of numbers flipping around a few hundred times a second,” says AI
scientist Sam Bowman. “And we just
have no idea what any of it means.”
Bowman is a professor at NYU, where he runs an AI research lab, and
he’s a researcher at Anthropic, an AI research company. He’s spent years
building systems like ChatGPT, assessing what they can do, and studying how
they work.
He explains that ChatGPT runs on something called an artificial
neural network, which is a type of AI modeled on the human brain. Instead of
having a bunch of rules explicitly coded in like a traditional computer
program, this kind of AI learns to detect and predict patterns over time. But
Bowman says that because systems like this essentially teach themselves, it’s
difficult to explain precisely how they work or what they’ll do. Which can lead
to unpredictable and even risky scenarios as these programs become more
ubiquitous.
I spoke with Bowman on Unexplainable, Vox’s podcast that
explores scientific mysteries, unanswered questions, and all the things we
learn by diving into the unknown. The conversation is included in a new
two-part series on AI: The Black Box.
This conversation has been edited for length and clarity.
Noam Hassenfeld
How do systems like ChatGPT work? How do engineers actually train
them?
Sam Bowman
So the main way that systems like ChatGPT are trained is by basically
doing autocomplete. We’ll feed these systems sort of long text from the web.
We’ll just have them read through a Wikipedia article word by word. And after
it’s seen each word, we’re going to ask it to guess what word is gonna come
next. It’s doing this with probability. It’s saying, “It’s a 20 percent chance
it’s ‘the,’ 20 percent chance it’s ‘of.’” And then because we know what word
actually comes next, we can tell it if it got it right.
This takes months, millions of dollars worth of computer time, and
then you get a really fancy autocomplete tool. But you want to refine it to act
more like the thing that you’re actually trying to build, act like a sort of
helpful virtual assistant.
There are a few different ways people do this, but the main one is
reinforcement learning. The basic idea behind this is you have some sort of
test users chat with the system and essentially upvote or downvote responses.
Sort of similarly to how you might tell the model, “All right, make this word
more likely because it’s the real next word,” with reinforcement learning, you
say, “All right, make this entire response more likely because the user liked
it, and make this entire response less likely because the user didn’t like it.”
Noam Hassenfeld
So let’s get into some of the unknowns here. You wrote a paper all about things
we don’t know when it comes to systems like ChatGPT. What’s the biggest thing
that stands out to you?
Sam Bowman
So there’s two connected big concerning unknowns. The first is that
we don’t really know what they’re doing in any deep sense. If we open up
ChatGPT or a system like it and look inside, you just see millions of numbers
flipping around a few hundred times a second, and we just have no idea what any
of it means. With only the tiniest of exceptions, we can’t look inside these
things and say, “Oh, here’s what concepts it’s using, here’s what kind of rules
of reasoning it’s using. Here’s what it does and doesn’t know in any deep way.”
We just don’t understand what’s going on here. We built it, we trained it, but
we don’t know what it’s doing.
Noam Hassenfeld
Very big unknown.
Sam Bowman
Yes. The other big unknown that’s connected to this is we don’t know
how to steer these things or control them in any reliable way. We can kind of
nudge them to do more of what we want, but the only way we can tell if our
nudges worked is by just putting these systems out in the world and seeing what
they do. We’re really just kind of steering these things almost completely
through trial and error.
Noam Hassenfeld
Can you explain what you mean by “we don’t know what it’s doing”? Do
we know what normal programs are doing?
Sam Bowman
I think the key distinction is that with normal programs, with
Microsoft Word, with Deep Blue [IBM’s chess playing software], there’s a pretty
simple explanation of what it’s doing. We can say, “Okay, this bit of the code
inside Deep Blue is computing seven [chess] moves out into the future. If we
had played this sequence of moves, what do we think the other player would
play?” We can tell these stories at most a few sentences long about just what
every little bit of computation is doing.
With these neural networks [e.g., the type of AI ChatGPT uses],
there’s no concise explanation. There’s no explanation in terms of things like
checkers moves or strategy or what we think the other player is going to do.
All we can really say is just there are a bunch of little numbers and sometimes
they go up and sometimes they go down. And all of them together seem to do
something involving language. We don’t have the concepts that map onto these
neurons to really be able to say anything interesting about how they behave.
Noam Hassenfeld
How is it possible that we don’t know how something works and how to
steer it if we built it?
Sam Bowman
I think the important piece here is that we really didn’t build it in
any deep sense. We built the computers, but then we just gave the faintest
outline of a blueprint and kind of let these systems develop on their own. I
think an analogy here might be that we’re trying to grow a decorative topiary,
a decorative hedge that we’re trying to shape. We plant the seed and we know
what shape we want and we can sort of take some clippers and clip it into that
shape. But that doesn’t mean we understand anything about the biology of that
tree. We just kind of started the process, let it go, and try to nudge it
around a little bit at the end.
Noam Hassenfeld
Is this what you were talking about in your paper when you wrote that
when a lab starts training a new system like ChatGPT they’re basically
investing in a mystery box?
Sam Bowman
Yeah, so if you build a little version of one of these things, it’s
just learning text statistics. It’s just learning that ‘the’ might come before
a noun and a period might come before a capital letter. Then as they get
bigger, they start learning to rhyme or learning to program or learning to
write a passable high school essay. And none of that was designed in — you’re
running just the same code to get all these different levels of behavior.
You’re just running it longer on more computers with more data.
So basically when a lab decides to invest tens or hundreds of
millions of dollars in building one of these neural networks, they don’t know
at that point what it’s gonna be able to do. They can reasonably guess it’s
gonna be able to do more things than the previous one. But they’ve just got to
wait and see. We’ve got some ability to predict some facts about these models
as they get bigger, but not these really important questions about what they
can do.
This is just very strange. It means that these companies can’t really
have product roadmaps. They can’t really say, “All right, next year we’re gonna
be able to do this. Then the year after we’re gonna be able to do that.”
And it also plays into some of the concerns about these systems. That
sometimes the skill that emerges in one of these models will be something you
really don’t want. The paper describing GPT-4 talks about how when they first
trained it, it could do a decent job of walking a layperson through building a
biological weapons lab. And they definitely did not want to deploy that as a
product. They built it by accident. And then they had to spend months and
months figuring out how to clean it up, how to nudge the neural network around
so that it would not actually do that when they deployed it in the real world.
Noam Hassenfeld
So I’ve heard of the field of interpretability. Which is the science
of figuring out how AI works. What does that research look like, and has it
produced anything?
Sam Bowman
Interpretability is this goal of being able to look inside our
systems and say pretty clearly with pretty high confidence what they’re doing,
why they’re doing it. Just kind of how they’re set up being able to explain
clearly what’s happening inside of a system. I think it’s analogous to biology
for organisms or neuroscience
for human minds.
But there are two different things people might mean when they talk
about interpretability.
One of them is this goal of just trying to sort of figure out the
right way to look at what’s happening inside of something like ChatGPT figuring
out how to kind of look at all these numbers and find interesting ways of
mapping out what they might mean, so that eventually we could just look at a
system and say something about it.
The other avenue of research is something like interpretability by
design. Trying to build systems where by design, every piece of the system
means something that we can understand.
But both of these have turned out in practice to be extremely,
extremely hard. And I think we’re not making critically fast progress on either
of them, unfortunately.
Noam Hassenfeld
What makes interpretability so hard?
Sam Bowman
Interpretability is hard for the same reason that cognitive science
is hard. If we ask questions about the human brain, we very often don’t have
good answers. We can’t look at how a person thinks and explain their reasoning
by looking at the firings of the neurons.
And it’s perhaps even worse for these neural networks because we
don’t even have the little bits of intuition that we’ve gotten from humans. We
don’t really even know what we’re looking for.
Another piece of this is just that the numbers get really big here.
There are hundreds of billions of connections in these neural networks. So even
if you can find a way that if you stare at a piece of the network for a few
hours, we would need every single person on Earth to be staring at this network
to really get through all of the work of explaining it.
More here:
https://www.vox.com/unexplainable/2023/7/15/23793840/chat-gpt-ai-science-mystery-unexplainable-podcast
I found the
issues raised here fascinating and clearly above my pay pay-grade, but nevertheless
important to wonder about.
My guess is
that it is the complexity of these systems that allow them to work and to be so
hard to really understand. Given the importance of the tasks AI is now
addressing it is important to have some idea of what is actually going on!
Other
thoughts welcome!
David.