Leading AI expert says we should we be acting now to avoid future risks of "rogue AIs" — is he right?

In a new blog post, Yoshua Bengio lays out his rationale for why we should be paying a lot more attention to the existential risks presented by future artificial intelligence

May 25, 2023

Yoshua Bengio is not a fringe scientist or an AI doomsayer. Rather, he’s a highly respected researcher and thinker in artificial intelligence. As well as his work in deep learning that helped fuel the current wave of AI innovation (and for which he was awarded the Association of Computing Machinery AM Turing Award in 2018) he co-directs the CIFAR Learning in Machines & Brains program — a program that has long been at the cutting edge of AI research and development.

In other words, he knows his stuff. And so when he writes about the potential risks of “rogue AI” it’s worth paying attention.

Bengio’s recent article on “How Rogue AIs may Arise” is well worth reading. I don’t agree with all of his reasoning as you’ll see below. But I do respect his thought process — and the urgency with which he’s appealing for research and action here.

Perhaps it’s fortuitous that, just a couple of days after posting his piece, the US government released an update to the National Artificial Intelligence Strategic Research and Development Plan that focuses heavily on understanding and addressing the ethical, legal, and societal implications of AI — including “the existential risk associated with the development of artificial general intelligence through self-modifying AI or other means”. It certainly seems that increasing concern is being expressed around what happens if and when AI emerges with agency and power.

In his article, Bengio proposes two hypotheses:

Hypothesis 1: Human-level intelligence is possible because brains are biological machines.
Hypothesis 2: A computer with human-level learning abilities would generally surpass human intelligence because of additional technological advantages.

Hypothesis 1 is reasonable, although the emphasis on human-level intelligence bothers me. If intelligence is an emergent property that depends solely on compute architectures and capability (irrespective of whether the substrate is biological, silicon, or otherwise) there is no reason to suppose that high-level intelligence is not possible within a constructed machine. But whether this is human-level, human-comparable, or very much non-human, is a tough call to make at this point.

These distinctions are important, and this is one of many reasons why more research and thinking is urgently needed here. But that aside, let’s assume that machines are capable of some form of high-functioning intelligence.

Hypothesis 2 is perhaps trickier as the nature of intelligence is still deeply contested, meaning that it would be hard to say whether a machine surpassed human intelligence or not. That said, it’s reasonable to suppose that an intelligent machine would be able to do some things very much more effectively than humans, simply by nature of the systems it’s plugged into, the information it has access to, and the mechanisms at its disposal to bring about change.

From these hypotheses, Bengio makes three claims that are accompanied by reasoned arguments:

Claim 1: Under hypotheses 1 and 2, an autonomous goal-directed superintelligent AI could be built.
Claim 2: A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity.
Claim 3: Under hypotheses 1 and 2, a potentially rogue AI system could be built, as soon as the required principles for building superintelligence will be known.

Claim 1 seems reasonable, but with a caveat. I’m not a fan of the “superintelligence” hypothesis, and I’ve written about this before — especially in critique of philosopher Nick Bostrom’s ideas.

Central to Bostrom’s thinking is the concept of superintelligence arising from increasingly rapid recursive improvements of AI by AI, leading to systems that far surpass what we consider to be human intelligence.

Benjio doesn’t quite go there. Instead he sticks with the simpler idea that humans could build a superintelligent machine. Maybe an exceptionally powerful autonomous artificial entity capable of drawing on a multitude of resources and mechanisms to achieve a set of goals will be possible based on hypotheses 1 and 2 — I’d buy that. But “superintelligence?” I’m not so sure.

Claim 2 is a little more complex, and I must confess that here I get hung up by what is meant by “rogue.” The term usually refers to unexpected, unpredictable, and uncontrolled behavior that is problematic to some group or community. In the context of digital technologies it can refer to algorithms/systems that behave in unintended ways that cause problems.

So far so good. But this sets up advanced AI as something that is expected to “behave” and “follow the rules” — a framing that is reinforced in claim 2 as values-alignment is invoked.

An alternative way of articulating claim 2 is that the emergence of AI’s with agency means that they might not do as they are told, or that they may act in ways that some people decide are inappropriate — with the consequence that harm occurs, whether this entails loss of life, health, income, environment, dignity, pride, or something else of value.

I’m being picky here as I suspect that claim 2 will indeed frame conversations around potential risks from advanced AI. But here it is worth asking who’s values matter, who decides what’s appropriate, what type of harm we’re concerned with, and whether all this is decided by a relatively small group of people who believe they have the right to determine what what constitutes “good” and “bad” behavior, and how this should be enforced.

Then there’s claim 3 — that a potentially rogue AI system could be built as soon as the required principles for building superintelligence are known.

I’m not sure I buy into this claim — not necessarily because I disagree with Bengio’s argument that if stupid things are possible with tech someone will have a go, but because there’s quite a large leap between understanding how something might be done, and actually achieving it.

Part of the challenge here is that technology innovation is iterative and messy — there’s rarely a straight line between theory, principles, and implementation. This messiness also allows for self-correcting iteration, which is part of what makes predicting the future of AI fiendishly hard.

In fleshing out the risks he perceives around claim 3, Benjio focuses on human intent to cause harm (under the framing of “genocidal humans”) and unintended harm (where he spends some time on the concept of “wireheading”).

I worry about the basis of the “genocidal humans” argument — it’s an argument that assumes that, because of a range of adverse social factors, someone will become hell-bent on developing a rogue AI to cause harm — just because they want to lash out and hurt the people and society that hurt them. This is an argument that’s rooted in an assumption that evil thoughts and actions arise from “human suffering, misery, poor education and injustice,” and that the solution to reducing the number of “genocidal humans” lies in fixing these problems.

Sadly, people and society aren’t this simple. I get the sentiment — and addressing basic human rights is fundamental to building a better future together. But I’m neither convinced that failures here will lead to comic book villains that use AI to destroy society, or that fixing what we assume to be the causes of “bad behavior” will lead to everyone behaving more reasonably (whatever that means).

In contrast to people intent on using AI to cause harm, Benjio’s arguments around “wireheading” focus on perverse incentives leading to harmful behavior in powerful AI systems.

Wireheading is a speculative concept that considers whether short-circuiting reward systems can lead to catastrophic failure. It’s based on a science fiction idea (which in turn is rooted in experiments from the 1950’s and 1960’s) that if the brain’s evolved pleasure/reward system was bypassed by the ability to stimulate pleasure at any time through an inserted wire (hence “wirehead”), you could effectively destroy a person’s ability to live a healthy life. In effect, their only goal would be to ignore anything (eating, exercise, sleep, sex, social interactions, pretty much everything that makes up a healthy person in a healthy society) in preference for pleasure stimulation via the wire.

This concept has been extended to AI, and the idea that if an artificial intelligence learned how to hack the systems that rewarded progress toward achieving goals, it could abandon those goals in favor of self-gratification.

In such a scenario, a powerful AI could cause mayhem as it escaped the shackles of human control to satisfy itself without doing what was required of it. It’s certainly a dystopian vision of powerful AI gone very wrong. But apart from worrying overtones of power, control, and subservience, it’s also a science fiction vision that may turn out to be very naive.

Bengio goes on to flesh out fears around human-manipulation by AI (which is a serious concern, and one I’ve written about previously), the risks of trying to reproduce human traits in machines (including physical embodiment), and the role of evolutionary pressures in the emergence of AI that is adversarial toward humans.

If I’m honest, I’m not sure how plausible I find some of these ideas, and I worry that they are based in part in a naive understanding of power dynamics and complex ecosystems. But there is a “but” here.

Despite my misgivings around many of the arguments and concerns that Bengio puts forward, I do agree with him one one important point: We need to be thinking seriously and creatively around how the world will change — and how we will navigate this transition — if and when AI becomes powerful enough to disrupt the human-centric world we’ve constructed, and especially if this power is accompanied by self-awareness. And because of this, I very much value Bengio beginning to explore and flesh out thinking here in a way that stimulates conversation and new ideas.

This is not to say that nearer-term challenges around developing safe and responsible AI are not important — they absolutely are. But we also need to ensure that we have a diverse group of people across many different domains of expertise and experience who are effectively “red teaming” low probability but high consequence possibilities around the emergence of AI systems that could cause widespread and long-lasting harm.

These explorations need to be grounded in a sophisticated understanding of behavior and how it intersects with technologies that have agency. And they need to be mindful of the cognitive traps inherent in human exceptionalism, power dynamics, and control.

After all, human history is rife with examples of how people have gone “rogue” and propagated evil on the back of an absolute certainty that the solution to value alignment is to rob others of their right to hold values that don’t align with theirs.

And maybe this should be our greatest fear around advanced AI — that it will look too much like us.

If this is the case, perhaps we need to be thinking as much about what it means to be an intelligent machine in a future where humans exist, as it does to be human in a future where powerful AI’s exist. And not just because this is a provocative framing, but because beyond fears of rogue AI butting up with human supremacy, we need to be thinking about how we will cooperate with machines with agency if and when they arise, and how we co-create a shared future, rather than one where we are struggling to control and contain a creation that we feel threatened by.

1 Comment

Phil Tanny

Hippy Toons

May 26, 2023

Bengio writes...

"A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity."

Given that the species developing AI routinely ignores the well-being of humanity and the biosphere, and that there is widespread often violent disagreement within that species regarding what human rights and values should be, and....

Given that AI has no where to obtain it's values other than from this confused violent species, and/or the world of nature at large, both of which are governed by the rules of evolution such as survival of the fittest and the strong dominating the weak....

When referring to rogue AI, why are we still using the word "potentially"??

Expand full comment