Does the “Superintelligent Will” Exist?

This piece was written as a term paper for the course ‘Philosophy and Cognitive Science’ at the Young India Fellowship and I co-authored it with my batchmate Deepain Yadav.

The pdf of the Superintelligent Will by Nick Bostrom can be found at this link.

Abstract

In this paper, we first present the arguments made by Nick Bostrum in his paper ‘The Superintelligent Will’ in which he discusses the relationship between motivation and intelligence in artificial agents. He presents two theses, the ‘orthogonality thesis’ and the ‘instrumental convergence thesis’ to predict that superintelligent AI might cause the demise of human civilization. The orthogonality thesis, in simple terms, purports that the goal an entity pursues is independent of the level of its intelligence. The instrumental convergence thesis states that in order to achieve any final goal, a superintelligent (SI) or even an intelligent being, i.e humans, must possess a set of intermediary values that are necessary to reach the final goal.

We will, after laying down Bostrom’s arguments clearly, give arguments as to why Bostrom insistence on ‘instrumental rationality’ follows a reductionist approach which narrows down the definition of intelligence. We will then try and prove how the emotional and moral dimension of human intelligence is a necessary component of the definition of ‘intelligence’ and how Bostrom’s definition itself limits the possibility of an ‘intelligence explosion’ of superintelligent entities. And finally present some counters to the theses themselves.

Introduction

Is there a possibility of artificial intelligence (AI) more intelligent than humans? If yes, will there be an explosion of ever-increasing intelligence, created by these AIs themselves? Many leaders in technology like Elon Musk believe that this is one of the most significant threats facing humanity. Musk has claimed that well intentioned AI researchers could “produce something evil by accident”—including, possibly, “a fleet of artificial intelligence-enhanced robots capable of destroying mankind.” Nick Bostrum is one of the foremost academicians supporting this view.

In his paper, ‘Superintelligent Will’, Nick Bostrum presents his arguments for the development of a superintelligent (SI) artificial being which, even while pursuing benign goals could spell the end of human civilization. A common illustration often used in this context is the ‘paperclip maximiser’, (Bostrom, 2003) an AI whose goal is to maximise the number of paperclips in its collection by any means. It could pursue this goal relentlessly and flood the earth with paperclips which would lead to the destruction of all life forms on earth. The theoretical basis of such a possibility lies in the orthogonality thesis and the instrumental convergence thesis.

The paper is organised into two broad sections. In the first one, we describe the main arguments used by Bostrom in his paper to show the possibility of harm to humanity at the hands of AI and the second one lists some of the objections that might be raised against Bostrom’s line of reasoning.

Nick Bostrum’s arguments

Orthogonality Thesis

Bostrum argues that the popular notion of intelligence is rather narrow. He believes that intelligence has been anthropomorphised. We have a tendency to look at both inanimate and animate non-human beings, machines and aliens, in terms of human intelligence. An example of this is the tendency to attribute human like thinking abilities and motivations to such simple machines like a TV. “The TV does not want me to enjoy the show today” are the kind of expressions that are often put to use.

He argues that in contrast, artificial intelligence can be far less human like in its motivations. This set of motivations might include a specific goal, something as niche as counting the grains of sand at Boracay or calculating the decimal points of Pi indefinitely. This domain specificity however, does not make the machine any less intelligent than a human according to Bostrum. “Skill at prediction, planning and means-end reasoning in general” is how Bostrum chooses to define ‘intelligence’. This means-end approach, where the ‘motivation’ of a being, animate or inanimate, is to complete the final task assigned to it/them is referred to as instrumental rationality.

The idea that intelligence is independent of both, the final task for which it is utilized and the motivation for completing the said task, forms the crux of the orthogonality thesis. Bostrom argues that this is possible within the Humean (of David Hume) framework, in which action is based not on beliefs but desires. So a superintelligent can be made to pursue any given goal if it has a strong enough desire to do so. Bostrom goes ahead to say that even if one were to not consider the Humean approach, this is possible if the system were constructed in such a way so as to have no system of ‘beliefs’ or ‘desires’ analogous to humans but if it were a system of arbitrarily high intelligence designed to pursue any given final goal.

Instrumental Convergence Thesis

Bostrom’s second thesis is the ‘instrumental convergence’ thesis. It states that “several instrumental values can be identified which are convergent in the sense that their

attainment would increase the chances of the agent’s goal being realized for a wide range

of final goals and a wide range of situations, implying that these instrumental values are

likely to be pursued by many intelligent agents.” Put simply, it states that any intelligent being, intelligence here defined as ‘instrumental rationality’, will necessarily take certain intermediate steps to reach any final goal. This argument has a close analogy in the “good trick” argument of Daniel Dennett. Evolution is a goal directed process with the survival of the species being the ultimate goal. In spite of the enormous variation in environments, there are some paths that are inevitably useful, like eyesight. Bostrom argues that greater an agents’ intelligence, the greater will be its ability to recognize these instrumental values necessary for the pursuance of a final goal, whatever it may be. While the final goal may differ, the ‘instrumental values’ remain the same, thus these values become the point of convergence for the superintelligent beings, hence the name. The instrumental values, to be pursued by SI beings as listed by Bostrom:

Self Preservation : If an agent lasts for a longer period of time, the probability that it will be able to complete a task is higher. Thus, an agent is inclined to make efforts to increase its lifespan. In his own words, “Agents with human-like motivational structures often seem to place some final value on their own survival.”

Goal Content Integrity : This, in the context of SI beings means that goal-continuity constitutes a key aspect of survival. Bostrom argues that for humans, attaining a final goal isn’t always the first priority as we often change our final goals. This, he argues, is because for humans, survival (preservation of our physical bodies) is the final goal in itself, which is not the case with SI beings. Their final goal is the completion of the task they have been designed for and this might even take precedence over survival as an SI can always replicate itself unlike humans.

Cognitive Enhancement : This simply means that with improvements in the rationality and intelligence, an agent will tend to improve its decision-making, making the agent more likely to achieve her final goals. Thus cognitive enhancement here refers to adaptability in an evolutionary sense. As an agent improves its cognitive abilities, its decisions become more aligned towards the final goal.

Technological Perfection : With its final goal in mind, a super intelligent being would want to achieve that goal faster and more efficiently. Technological perfection would help its case.

Resource Acquisition : As Bostrum puts it quite simply himself, “resource acquisition is another common emergent instrumental goal, for much the same reasons as technological perfection: both technology and resources facilitate physical

construction projects.”

The full picture

Since intelligence is arguably independent of goals, it is possible to design AI which completes mundane tasks, but is still superintelligent. In the pursuance of even mundane tasks, the entity will find a few sub-goals that it needs to pursue to make sure it is able to increase the chances of achieving its goals. Some of these sub-goals like resource acquisition might come at the cost of humans and lead to the destruction of our race.

Objections

If intelligence is to be defined as the ability to achieve end goals in the most efficient way possible, by being skillful at “planning, prediction and means-end reasoning”, a fully developed autonomous car satisfies this definition. The self-driving car is a machine which can process large amounts of data on hardware with high processing power, and can decide which obstacles to avoid. But that is what it is, a high speed agent in an environment. If we extend this example to model a superintelligent being, we can imagine an entity which can gather every single bit of data that the environment has to offer and has the processing power to make sense of it all. According to Marcus Hutter, there might be a limit to what can be done even if all the information in the world can be captured and processed (Hutter, 2012) and this puts a brake on the progression of intelligence, as defined by Bostrom. Thus, a superintelligent being developed as a result of the ‘intelligence explosion’ may not even be possible.

The second objection is to the definition of intelligence as instrumental rationality. Bostrum is denying a place to human emotions and the moral spectrum of rationality that comes naturally to humans. He does this because he believes that these are not important in carrying out a specific task which has a final goal, i.e. these fall outside the ambit of ‘instrumental rationality’. But these qualities are essential even in the most banal reasoning that humans do on a regular basis. This was most famously illustrated by neurologist Antonio Damasio in his book Descartes’ Error where he showed how even some non-limbic originating emotions played a role in reasoning in humans. Studies have shown that adults with impaired abilities to ‘feel’ have trouble making decisions, indicating that there is no “pure reason”. Damasio contends that emotion plays a role in ‘biasing’ our decisions as humans could not possibly consider every logical possibility and make a decision in time. (Picard, 2000) Even if we consider a machine with much greater processing power such that the above time constraint does not apply, its increased power will lead to an ever increasing number of possibilities which will land it in the same situation that humans find themselves. Thus, an unfeeling AI may not be able to take decisions at all.

There are also some objections to the theses themselves. An exception to the orthogonality thesis exists which opens it up to objections and the possibility of more exceptions. A possible goal for a superintelligent entity might be to reduce its intelligence to that of a human. Since orthogonality allows for ‘any’ goal to be combined with any level of intelligence, this is a possible scenario. But the consequence of such a scenario, the downgrading of its intelligence, does not lie within the thesis since the entity will no longer be superintelligent with that goal. (Haggstorm, 2019)

Another objection to the theses is that instrumental goals will cease to be relevant if the entity discovers that its final goal is moot. To use Olle Haggstrom’s example, if an entity designed to help send human souls to heaven discovers through advancements in science that there does not exist a soul, it will not know what to do since it will be left without a goal. In such a scenario, the question of instrumental goals and the possibility of human extension is irrelevant.

Conclusion

Nick Bostrom, through his limited definition of intelligence as instrumental rationality and by using the orthogonality and instrumental convergence theses, tries to prove the imminent danger to humanity from AI. But as we have tried to explain, the argument suffers from deficiencies at multiple levels. There is a flaw in the definition of intelligence which not only precludes the advent of the singularity but also does not allow for decision making in the absence of emotions and feelings. There are also reasons to belive that the two theses may not be completely sound.

References :

Bostrom, Nick. “The superintelligent will: Motivation and instrumental rationality in advanced artificial agents.” Minds and Machines 22.2 (2012): 71-85.

Häggström, Olle. “Challenges to the Omohundro–Bostrom framework for AI motivations.” foresight 21.1 (2019): 153-166.

Hutter, Marcus. “Can intelligence explode?.” Journal of Consciousness Studies 19.1-2 (2012): 143-166.

Picard, Rosalind W. Affective computing. MIT press, 2000.