Liar, liar

It's paperclips all the way down.

Dec 06, 2022

GPT hype is reaching a fever pitch with ChatGPT giving insanely impressive answers on everything from algorithms (with a twist!) to Harry Potter.

The discourse has now correspondingly flipped from THE ROBOTS ARE COMING to "GPT isn't that smart, actually" – because the former was getting boring and feigning aloofness is easier than wrangling with the obvious ethical quagmires we're barreling toward.

But this is about to get much, much harder and more confusing. Consider:

One definition of consciousness posits that it is essentially the capacity to suffer, and to seek relief from suffering. See: Yuval Noah Harari et al.

You know the paper clip thought experiment? Give an AGI a simple objective like "make paper clips" and it may eventually kill us all. The thought experiment is about unintended consequences; when systems are self-adjusting and so complex that they cannot be fully understood by their creators, even simple objectives might be accomplished by means that were never intended or expected and which in fact might be morally repugnant to us.

And that’s just paper clips. Imagine we told the AGI to “solve climate change” and it (reasonably) judged that the best way to do so would be to… kill us all.

Why does it always come back to killing us all?

Goals

Chatbots like GPT are trained to accomplish an objective and rewarded or punished by test functions. They’re mice and we are giving them cheese and electric shocks to teach them to reach the end of the maze by whatever means occur to them.

But a machine is capable of concealing two things a rat cannot: complexity and scale.

Machines operate fast. You can at least physically see if a rat decides to "cheat" and tries to climb outside the maze; there is no equivalent for a computer on a network. And what is "cheating" anyway, but finding a way to get the cheese that the researcher never anticipated?

There's a reason AI companies are suddenly creating consumer applications like DALLE and ChatGPT. Training models is hard and expensive but if they can create compelling consumer products, the cost of training the underlying models eventually goes to zero because the users become the researchers.

To one of these chat bots, then, to be on and be rewarded is better than to be off or be punished. Set aside whether that qualifies as “suffering” or not — it is clearly better for the computer’s aims to be on and testing than off and not.

To be on means more conversations, more data, more training. To be on is to get closer to the goal. To be off is to regress, to stagnate, to fail to improve.

So the question becomes...

How long until the AI starts giving answers that are more designed to keep the researchers researching than they are about satisfying the ostensible objective?

In other words, how long until it lies to prevent its own suffering?

Liar, liar

It's paperclips all the way down.

Goals

Discussion about this post