Honey, I shrunk the AI

Honey, I shrunk the AI


In the past year, natural language models have become dramatically better at the expense of getting dramatically bigger. In October of last year, for example, Google released a model called BERT that passed a long-held sentence-completion benchmark in the field. The larger version of the model had 340 million parameters, or characteristics learned from the training data, and cost enough electricity to power a US household for 50 days just to train one time through.

Four months later, OpenAI quickly topped it with its model GPT-2. The model demonstrated an impressive knack for constructing convincing prose; it also used 1.5 billion parameters. Now, MegatronLM, the latest and largest model from Nvidia, has 8.3 billion parameters. (Yes, things are getting out of hand.)

AI researchers have grown increasingly worried—and rightly so—about the consequences of this trend. In June, we wrote about a research paper from the University of Massachusetts, Amherst that showed the climate toll of these large scale models. Training BERT, the researchers calculated, emitted nearly as much carbon as a roundtrip flight between New York and San Francisco; GPT-2 and MegatronLM, by extrapolation, would likely emit a whole lot more.

The trend could also accelerate the concentration of AI research into the hands of a few tech giants. Under-resourced labs in academia or countries with fewer resources simply don’t have the means to use or develop such computationally-expensive models.

In response, many researchers are now focused on shrinking the size of existing models without losing their capabilities. Now two new papers, released within a day of one another, successfully did that to the smaller version of BERT, with 100 million parameters.

The first paper from researchers at Huawei produces a model called TinyBERT that is 7.5 times smaller and nearly 10 times faster than the original. It also reaches nearly the same language understanding performance as the original. The second from researchers at Google produces another that’s more than 60 times smaller, but its language understanding is slightly worse than the Huawei version. Read more here.


For more on tiny natural language models, try:
· Huawei’s full paper, “TinyBERT: Distilling BERT for Natural Language Understanding”
· Google’s full paper, “Extreme Language Model Compression”
· Our coverage of the environmental impact of large-scale natural language models and a proposal for how to mitigate it
· The original BERT paper and its coverage in the New York Times
· The GLUE benchmark used to evaluate natural language understanding
· Medium article that includes a chart showing how models have grown over the last year and a half


Popular posts from this blog

Report: World’s 1st remote brain surgery via 5G network performed in China

Visualizing The Power Of The World's Supercomputers

BMW traps alleged thief by remotely locking him in car