SQ12. Does it appear “building in how we think” works as an engineering strategy in the long run?
Every scientific discipline has foundational questions. In human psychology, for example, there is the nature-versus-nurture question. How much of our behavior is due to our genes, and how much to our environment and upbringing?
AI also has its own fundamental nature-versus-nurture-like question. Should we attack new challenges by applying general-purpose problem-solving methods, or is it better to write specialized algorithms, designed by experts, for each particular problem? Roughly, are specific AI solutions better engineered in advance by people (nature) or learned by the machine from data (nurture)?
In a March 2019 blog post,1 Richard Sutton—one of the leading figures in reinforcement learning—articulated the “nurture” perspective. “The biggest lesson that can be read from 70 years of AI research,” he wrote, “is that general methods that leverage computation are ultimately the most effective, and by a large margin.” He backed up this claim with some compelling examples drawn from subfields of AI such as computer games (where no chess- or Go-specific strategies are present in championship-level chess or Go programs), speech recognition (where statistics has steadily replaced linguistics as the engine of success) and computer vision (where human-designed strategies for breaking down the problem have been replaced by data-driven machine-learning methods).
Another leading figure, Rodney Brooks,2 replied with his own blog post, countering that each of AI’s notable successes “have all required substantial amounts of human ingenuity”; general methods alone were not enough. This framing is something of a turnaround for Brooks, one of the founders of behavior-based robotics, as he is well known for trying to build intelligent behaviors from simple methods of interacting with the complex world.
This fundamental debate has dogged the field from the very start. In the 1960s and 1970s, founders of the field—greats like Nobel prize winner Herbert Simon and Turing Award winner Alan Newell—tried to build general-purpose methods. But such methods were easily surpassed with the specialized hand-coded knowledge poured into expert systems in the 1980s. The pendulum swung back in the 2010s, when the addition of big data and faster processors allowed general-purpose methods like deep learning to outperform specialized hand-tuned methods. But now, in the 2020s, these general methods are running into limits, and many in the field are questioning how we best make continued progress.
One limitation is the end of Moore’s Law. We can no longer expect processing power to double every two years or so, as it has since the beginning of the computer age.3 After all, every exponential trend in the real world must eventually wind down. In this case, we are starting to run into quantum limits and development costs. One reaction to this constraint is to build specialized hardware, optimized to support AI software. Google’s Tensor Processing Units (TPUs) are an example of this specialized approach.4
Another limit is model size. A record was set in May 2020 by GPT-3, a neural network language model with 175 billion parameters. GPT-3 is more than ten times the size of the previous largest language model, Turing NLG, introduced just three months earlier. A team at OpenAI calculated5 that, since 2012, the amount of computation used in the largest AI training runs has been increasing exponentially, with a doubling time of roughly three-and-half months. Even if Moore’s Law were to continue, such an accelerated rate of growth in model size is unsupportable.
Sustainability constitutes an additional limit. Many within the field are becoming aware of the carbon footprint of building such large models. There are significant environmental costs. A 2015 report6 cautioned that computing and communications technologies could consume half of the electricity produced globally by 2030 if data centers cannot be made more efficient. Fortunately, companies have been making data centers more efficient faster than they have been increasing in size and are mostly switching to green energy, keeping their carbon footprint stable over the past five years. GPT-3 cost millions of dollars to build, but offsetting the CO2 produced to train it would cost only a few thousand dollars. Microsoft, which provided the compute for GPT-3, has been carbon neutral since 2012 and has made commitments to further environmental improvements in the years to come.7
Availability of data holds things back. Deep learning methods often need data sets with tens of thousands, hundreds of thousands, or even millions of examples. There are plenty of problems where we don’t have such large data sets. We might want to build models to predict the success of heart-lung transplants, but there is limited data available to train them—the number of such operations that have been performed worldwide is just a few hundred. In addition, machine-learning methods like deep learning struggle to work on data that falls outside their training distribution.
Existing systems are also quite brittle. Human intelligence often degrades gracefully. But recent adversarial attacks demonstrate that current AI methods are often prone to error when used in new contexts.8 We can change a single pixel in the input to an object-recognition system and it suddenly classifies a bus as a banana. Human vision can, of course, be easily tricked, but it is in very different ways to computer vision systems. Clearly, AI is “seeing” the world idiosyncratically compared to human beings. Despite significant research on making systems more robust, adversarial methods continue to succeed and systems remain brittle and unpredictable.
And a final limit is semantic. AI methods tend to be very statistical and “understand” the world in quite different ways from humans. Google Translate will happily use deep learning to translate “the keyboard is dead” and “the keyboard is alive” word by word without pausing for thought, as you might, about why the metaphor works in the former but not the latter.
The limitations above are starting to drive researchers back into designing specialized components of their systems to try to work around them. The recent dominance of deep learning may be coming to an end.
What, then, do we make of this pendulum that has swung backwards and forwards, from nature to nurture and back to nature multiple times? As is very often the case, the answer is perhaps likely to be found somewhere in between. Either extreme position is a straw man. Indeed, even at “peak nurture,” we find that learning systems benefit from using the right architecture for the right job—transformers for language and convolutional nets for vision, say. Researchers are constantly using their insight to identify the most effective learning methods for any given problem. So, just as psychologists recognize the role of both nature and nurture in human behavior, AI researchers will likely need to embrace both general- and special-purpose hand-coded methods, as well as ever faster processors and bigger data.
Indeed, the best progress on the long-term goals of replicating intelligent behaviors in machines may be achieved with methods that combine the best of both these worlds.9 The burgeoning area of neurosymbolic AI, which unites classical symbolic approaches to AI with the more data-driven neural approaches, may be where the most progress towards the AI dream is seen over the next decade.
 It is not well known that Moore’s Law has been officially dead for several years. The International Technology Roadmap for Semiconductors is the industry body that works out the road map to achieve Moore’s Law. In 2014, it declared that the industry’s goals would no longer be doubling every two years. And, if it is no longer part of the plan of the major chip-making companies, then we can be sure it will not happen. Juan-Antonio Carballo, Wei-Ting Jonas Chan, Paolo A. Gargini, Andrew B. Kahng, and Siddhartha Nath, "ITRS 2.0: Toward a re-framing of the Semiconductor Technology Roadmap," 2014 IEEE 32nd International Conference on Computer Design (ICCD), pp. 139-146, 2014 https://ieeexplore.ieee.org/abstract/document/6974673