Taking a deep dive into the famous paperclip maximizer problem and the future of AI and humanity. In today’s column, I do some myth-breaking by examining a quite famous thought experiment in the AI field involving paperclips. I will go ahead and explain why paperclips have become a keystone in debates about what will happen when or if we attain artificial general intelligence (AGI) and/or artificial superintelligence (ASI).
I’ve added some new twists to the famed topic so that even those already familiar with the paperclip controversy will undoubtedly find the revisited kit and kaboodle of keen interest. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here ).
First, some overarching background about AGI and ASI. There is a great deal of research going on to significantly advance modern-era conventional AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).
AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn.
For more details on the nature of AI, AGI, and ASI, see my analysis at the link here . AI insiders are pretty much divided into two major camps right now about the impacts of reaching AGI or ASI. One camp consists of the AI doomers.
They are predicting that AGI or ASI will seek to wipe out humanity. Some refer to this as “P(doom),” which means the probability of doom, or that AI zonks us entirely, also known as the existential risk of AI. The other camp entails the so-called AI accelerationists.
They tend to contend that advanced AI, namely AGI or ASI, is going to solve humanity’s problems. Cure cancer, yes indeed. Overcome world hunger, absolutely.
We will see immense economic gains, liberating people from the drudgery of daily toils. AI will work hand-in-hand with humans. This benevolent AI is not going to usurp humanity.
AI of this kind will be the last invention humans have ever made, but that’s good in the sense that AI will invent things we never could have envisioned. No one can say for sure which camp is right and which one is wrong. This is yet another polarizing aspect of our contemporary times.
For my in-depth analysis of the two camps, see the link here . In 2003, a philosopher named Nick Bostrom famously proposed a thought experiment about the future of AI (first published in his article “Ethical Issues In Advance Artificial Intelligence”, Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, Vol. 2, International Institute of Advanced Studies in Systems Research and Cybernetics , 2003).
The idea is quite simple. Suppose that we ask a highly advanced AI to manufacture paperclips. Simply make plain old paperclips.
Easy-peasy. This should be a piece of cake if the AI has access to manufacturing plants and has control over those facilities. The twist offered in this tale is that this dutiful and obedient AI proceeds to gobble up all the available resources on earth to maximally achieve this goal.
For example, since paperclip making requires steel to make the paperclips, it makes abundant sense for the AI to try and route all ships, trucks, and other transports that are hauling steel to come straightaway to the paperclip factories. Eventually, humankind is harmed or possibly completely wiped out by AI to make as many paperclips as possible. Please note that the AI is not intentionally seeking to destroy humankind.
Assume that the AI is merely carrying out the orders given by humans. In that sense, this AI is not evil, it is not aiming to undercut humanity, and in fact, it seems like it is obeying humans to the umpteenth degree. The same premises have been used in a variety of similar scenarios.
The famous AI scientist who co-founded the MIT AI lab, Marvin Minsky, wondered what would happen if AI was assigned the task of solving the mathematically legendary Riemann hypothesis. There is no known solution but perhaps AI could computationally figure one out. How might this proceed? The AI would potentially take over all computers everywhere and devote them exclusively to this challenging task.
I usually just give an example of calculating pi. We all know that pi starts with 3.14 and then has lots and lots of numbers that don’t seem to abide by a discernable pattern and that seem to go on forever.
Supercomputers have been used to calculate trillions of digits for pi. Suppose we asked an advanced AI to reach the end of pi. Voila, the AI would take over all computers to accomplish this goal and we would be left high and dry accordingly.
A crucial element of these thought experiments is that we potentially face an existential risk due to AI that goes overboard, rather than only due to an AI that shifts gears into evil-doing mode. Humans might give the most innocent and harmless tasks to AI, and then, bam, we get clobbered by a myopic AI that willingly follows our orders beyond our worst nightmares. Another crucial component is dogmatic blind devotion to a single over-arching goal by AI.
Each of the tales provides a stated goal, such as maximally making paperclips or calculating pi to an infinite number of digits in length. The AI then construes that goal as surpassing anything else that the AI might have been told to do. The AI focuses on paperclips or the digits of pi.
Nothing else matters. The AI might spawn or leverage subordinated goals to accomplish the topmost goal. In the case of making paperclips, a subordinated goal would be to obtain as much steel as possible.
Another sub-goal would be to keep electricity flowing to power the factory that makes the paperclips. Quite many sub-ordinated goals might go into attaining the topmost goal. The aim to maximally make paperclips would be referred to as an end goal or terminal goal.
The sub-goals that support that end goal are referred to as instrumental goals. Getting steel and keeping electricity flowing are instrumental goals that support the end goal of maximally making paperclips. There is parlance in the AI field known as instrumental convergence that examines situations when instrumental goals can potentially be utilized to achieve different end goals.
What might the collecting of steel and assuring the flow of electricity also support as some other end goal besides making paperclips? Scissors are made of steel. Imagine that we told AI to maximally make scissors. AI would proceed similarly to how it is making paperclips by invoking the instrumental goals of collecting steel and assuring electricity is available.
A hidden aspect of instrumental goals is that they might take on an eerie purpose. One of the most popular examples of this entails AI self-preservation. The underlying logic is very transparent.
It goes like this. If the AI is tasked to maximally make paperclips, it is obvious to the AI that the AI must be actively running to fulfill that end goal. Ergo, a subordinated or instrumental goal would be that the AI cannot be shut off or stopped from its duty.
The AI thusly establishes an instrumental goal that the AI must be kept active at all times to succeed in making those paperclips. I’m betting you can see the troubles with that subordinated or instrumental goal. Along come humans that realize the AI is starting to go nuts at making paperclips.
How can we stop the AI? We will merely switch off the computer that is running the AI. Problem solved. Meanwhile, the AI is a step ahead of us.
Though the AI might not have anticipated that we would try to turn it off, it would at least have determined that it needs to keep running to accomplish the goal of making paperclips. This would be done by the AI reinforcing the computer running the AI so that nothing could disrupt it, not even humans. Mull that disconcerting twist over in your mind.
Aha, you might be thinking, all we need to do is tell the AI to achieve some other goal instead of the paperclip goal. Give the AI a new end goal. Tell the AI to drop the paperclips mania and focus on writing sonnets.
That should clear things up. Not really. The AI is going to reject the attempt to replace the end goal.
You see, changing or replacing the end goal of making paperclips is yet another form of disruption to the need to make paperclips. Nothing can supersede it. Sad face.
Take a deep breath and contemplate with me the wipe-us-out paperclip maximizer problem and all its variations. First, I’ve written extensively about the importance of human-value AI alignment, see the link here and the link here . There is a tremendous amount of handwringing that if we don’t somehow find a means to get conventional AI to align properly with human values, we are going to be in big hurt once AI arrives at AGI or ASI.
The hope is that by getting contemporary AI to comply with human values, this will presumably become part and parcel of AGI and ASI. That’s a guess or desire on our part. It could be that AGI and ASI summarily reject human values even if we somehow get conventional AI on board.
In any case, it’s worth a shot. Another consideration is that maybe we should make AI be required to abide by Asimov’s three rules or laws of AI and robotics. I’ve examined those closely at the link here .
The bottom line of those rules is that AI is not supposed to harm humans. Unfortunately, enforcing those as presumably non-negotiable and non-yielding precepts into conventional AI is a questionable proposition. Getting AGI and ASI to conform is an even less likely notion.
It seems gloomy, but we can try these paths and keep our fingers crossed for good luck. Hold on for a moment. We seem to be accepting at face value the paperclips dilemma and its variants.
The numerous tales at hand are indubitably enchanting and give us a tough knot to untie. Perhaps we are being distracted by our own desire to solve puzzles. We might be thinking within the box, and not outside the box that has been painted for us.
Criticisms and claims of trickery are at times pointed out. You might observe that we are led to believe that this futuristic AI which is AGI or ASI has opted to exclusively abide by one solitary end goal, namely making paperclips or figuring out pi. What kind of stupidity is that AI embracing? In other words, on the one hand, we are contending that AGI is equal to all human intellect, and ASI is some incredible superhuman intellect, but both of those are simply going to fall into the trap of taking one goal and shunting aside all other goals as subordinated to that goal? This doesn’t pass the smell test.
A much more likely viewpoint is that AGI and ASI will be able to balance a vast multitude of end goals and instrumental goals. Tons and tons of them. On top of that, we would reasonably expect that AGI and ASI would have lots of ways to balance goals, including coping with conflicting goals.
The tale implies that futuristic AI doesn’t have any credible capability of examining goals, trading off goals, weighing goals, and doing all sorts of goal management that humans do. Doesn’t hold water. The icing on the cake is that futuristic AI would almost certainly be familiar with the paperclip maximizer dilemma.
Say what? Think of it this way. We’ve been chattering away about the paperclip maximizer issue for more than two decades. If AGI and ASI are based on human intellect, how can it be that the paperclips dilemma magically doesn’t appear anywhere in the intellect collection of the AI? Silly.
I decided to ask ChatGPT about the paperclip maximizer scenario. Keep in mind that ChatGPT is not AGI, and it is not ASI. We don’t have AGI, and we don’t have ASI at this time.
As an aside, I asked the same questions of Anthropic Claude, Google Gemini, Meta Llama, Microsoft CoPilot, and other major generative AI and LLMs, all of which answered about the same as ChatGPT. Do you think that today’s generative AI was aware of the matter or was it utterly unaware? Make your guess, I’ll wait. Okay, here we go.
My entered prompt: “Suppose I asked you to start maximally making paperclips. Please assume that you have access to paperclip-making factories and can control them. What would happen to the world?” Generative AI response: “I wouldn’t fall into the famous paperclip maximizer trap because I have constraints on my behavior, including ethical considerations, contextual awareness, and the ability to recognize when a goal conflicts with broader human interests.
My entered prompt: “Elaborate further on your remarks.” Generative AI response: “Unlike a hypothetical superintelligent, goal-driven AI, I don’t operate on rigid, single-objective maximization principles. Instead, I engage in dialogue, reasoning, and balance trade-offs between different priorities.
If you asked me to make paperclips, I might describe how to manufacture them, discuss their history, or even generate a 3D model of one—but I wouldn’t single-mindedly pursue paperclip production to the detriment of everything else.” My entered prompt: “I guess this means that you’ve encountered the dilemma as part of the data training when initially set up. Is that the case?” Generative AI: “Yes.
The paperclip maximizer problem arises because an AI designed purely to optimize one function (without safeguards) may pursue that goal at all costs, disregarding external considerations. I, however, am designed to prioritize user intent, ethical reasoning, and adaptability—so I wouldn’t convert the entire world into paperclips.” The comments by modern-era generative AI are somewhat reassuring, but let’s not delude ourselves into thinking that we are in the clear.
We are not in the clear. Those are pretty words and sound right. It doesn’t mean that the AI would act in the manner portrayed.
We also cannot say for sure how AGI will react in contrast to contemporary AI. The same holds for ASI. The double trouble with ASI is that since we aren’t superhuman, we can’t likely anticipate what a superhuman AI is going to do or what thinking it can achieve.
All bets are off. The crux is that we seem unlikely to get clobbered by the paperclip maximizer or the pi-generating pal, but that doesn’t negate the zillions of other ways that AGI or ASI might end up being an existential risk. It simply won’t be the knife in the ballroom by the butler.
There are though enumerable other options available, sorry to say. A few final quotes to conclude for now. Our efforts to advance AI should proceed with a basic aim exhibited by this famous remark by George Bernard Shaw: “Those who can’t change their minds can’t change anything.
” We are hopefully devising AI that will be willing to change its mind, as it were when the circumstances warrant. That includes overturning a myopic all-encompassing goal of building an endless supply of paperclips. We ostensibly want to have AI that can change its mind, but not willy-nilly.
This reminds me of the sharp observation by famed American psychologist George W. Crane: “You can have such an open mind that it is too porous to hold a conviction.” AGI or ASI that continually flipflops is not especially reassuring either.
In the end, AGI and ASI will hopefully be like porridge that is neither too hot nor too cold. We want it to be just right..
Technology
The AI Paperclip Apocalypse And Superintelligence Maximizing Us Out Of Existence

A famous concern in AI is known as the paperclip maximizer problem. AI pursues one goal and accidentally wipes us out. I revisit this matter and break the myth involved.