A group of leading researchers in AI (Hendrycks et al.) have proposed a definition of artificial general intelligence (AGI) that they hope will make this hitherto “frustratingly nebulous” notion more concrete and even quantitative. Their definition goes as follows:
AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.
Sadly, this definition fails at the first hurdle: it remains nebulous. Why? Because it does not explain what is meant by “match or exceed”, nor by “cognitive versatility and proficiency”. It becomes clear in the paper that “match and exceed” here means “generate outputs that look like those humans might generate”. In other words, a better term here would be “mimic”.
This might seem pedantic to some. Actually, however, it is crucial. I hope the authors of the paper would agree that one of the big challenges in AI is to avoid the attribution of human-like qualities where they are not warranted, or where we have no reason to suppose them to be present. The wording above invites the supposition of human equivalence: that AI is doing what humans do and thinking as humans do. Precisely because the mimicry of AI is often so impressive, it becomes all the more important to be very precise and careful with language. That precision has not been observed here.
What about “cognitive versatility and proficiency”? Here the paper uses the Cattell-Horn-Carroll (CHC) theory of cognitive abilities, “the most empirically validated model of human intelligence.” This theory “breaks down general intelligence into distinct broad abilities and numerous narrow abilities (such as induction, associative memory, or spatial scanning).”
In other words, “cognition” is here (and in Carroll’s theory) defined in terms of some selected attributes of what the human mind does. Notice, though, that the authors of the paper elide this definition of cognition with “general intelligence” – or indeed, human intelligence. This is misleading. First, even the definition they claim to use is in fact selective: they cite Schneider and McGrew (2018), for example, but Schneider and McGrew suggested adding an “emotional intelligence” domain to Carroll’s scheme. Yet that is not included in the scheme used in the paper. This of course is rather telling, since we generally suppose emotional intelligence to rely on an ability to empathize. For AI, it would be especially apparent that any attempted measure of “emotional intelligence” would equate to mimicry of the function itself.
But in any event, eliding cognition (within the narrow definition of CHC) and intelligence is not warranted. Wilhelm and Kyllonen (2021), for example, supplement the CHC cognitive traits with “non-cognitive” traits such as motivation, interests, and personality. (Personally I find it kind of funny that those things are not regarded as “cognitive”, but hey, I guess specialists can redefine colloquial words if they so desire.) I’m going to quote from Wilhelm and Kyllonen: “Carroll (1993) specifically restricted his survey to cognitive abilities as measured by performance on educational or psychological tests or tasks and thereby excluded the domains of attitudes, preferences, values, beliefs, and behavior as measured by self or others’ ratings, which is the most common approach for measuring social and emotional skills in psychology.”
The pattern, then, becomes clear pretty quickly. Of all the intelligent and indeed wonderful things that human minds do, this effort to operationalize AGI very quickly converges on attributes that can be “measured by performance on educational or psychological tests or tasks”. (I am not even going to get into the intelligence that goes with certain kinetic and motor-control tasks. If you’ve ever played football, for example, you’ll surely be dismayed at seeing the extraordinary abilities of good players – not just coordination but also, say, on-pitch vision and awareness – omitted from “human general intelligence”. Look, such oversights scream, you know how early AI researchers seemed to consider chess to be the pinnacle of human intelligence? Well, they haven’t shifted very much since!)
Now let’s take a little look at some of the details of the tests. “General knowledge”, for example, includes “History: Knowledge of past events and objects”, and “Science: Knowledge of the natural and physical sciences.” But what is knowledge? Educators recognize it as something more than an ability to regurgitate facts. Of course, this is not something that school tests tend to teach very well: it is quite possible for students to get good marks without truly acquiring “knowledge”, if they are “taught to the test”. That’s a failure of the exam system. But we do know that deep knowledge exists and can be acquired. And this is something that AI systems today never have.
Some might say: How do you know? Look, sometimes these systems can even explain their reasoning! But again, they are not “explaining their reasoning”. They are mimicking humans explaining their reasoning.
But how do you know??? – again comes the plaintive reply. Well, I can’t be sure, it’s true. But the reason I doubt it is that I don’t believe in magic. I don’t think that, simply be scaling up the size of AI systems, something magical happens whereby what begins as mere statistical mimicry turns into understanding. No one has ever provided a good argument why it should. Any claims of that sort are faith-based.
Another way to answer the question is simply to rephrase it. How do I know that this system, specifically and exquisitely (and expensively) designed by expert computer scientists to mimic humans is just – mimicking humans?
Anyway, so it goes on. Reading and Writing Ability? It includes “The ability to understand connected discourse during reading” and “The ability to write with clarity of thought.” AI would have to be assigned scores of zero here, because it does not “understand” or have “thought”. On-The-Spot Reasoning? “Attributing mental states to others and understanding how those states may differ from one’s own.” Another zero. Meaningful Memory? “The ability to remember specific events or experiences.” AI has no experiences. Zero.
Yet in humans these abilities are measured by psychometric tests of performance. And AI can often do pretty well in such tests, right? But for humans, we very reasonably infer that performance in the test reflects those attributes we deem we’re testing. Yes, I guess we can imagine someone somehow writing very clearly even while their heads are filled with a jumble of chaotic thoughts. Or with no thoughts at all. But it’s kind of hard to imagine that, right? It would be weird, and surely very exceptional. If it were possible at all, it would represent an instance when the testing failed in its job.
Sorry, but this seems blindingly obvious to me: that in applying this sort of testing to AI, we are not permitted to make the inferences we make for humans, because that’s not what the tests were designed for.
I applaud Hendrycks et al. for trying to bring some clarity and quantitative precision to the notion of AGI. But unless they are able to make clear that they are proposing to give this term a very narrow definition, which does not allow us to get one whit closer to a comparison between what a machine can do and what human intelligence actually is, I fear they are likely to just sow more confusion.