Honey, I shrunk the AI
10.04.19
In the past year, natural language models
have become dramatically better at the expense of getting
dramatically bigger. In October of last year, for example, Google
released a model called BERT that passed a long-held
sentence-completion benchmark in the field. The larger version of the model
had 340 million parameters, or characteristics learned from the training
data, and cost enough electricity to power a US household for 50 days just
to train one time through.
Four months later, OpenAI quickly topped it with its model
GPT-2. The model demonstrated an impressive knack for constructing
convincing prose; it also used 1.5 billion parameters. Now, MegatronLM, the
latest and largest model from Nvidia, has 8.3
billion parameters. (Yes, things are getting out of hand.)
AI researchers have grown increasingly worried—and rightly so—about the
consequences of this trend. In June, we wrote about a research
paper from the University of Massachusetts, Amherst that showed
the climate toll of these large scale models. Training BERT, the
researchers calculated, emitted nearly as much carbon as a roundtrip flight
between New York and San Francisco; GPT-2 and MegatronLM, by extrapolation,
would likely emit a whole lot more.
The trend could also accelerate the concentration of AI research into the
hands of a few tech giants. Under-resourced labs in academia or countries
with fewer resources simply don’t have the means to use or develop such
computationally-expensive models.
In response, many researchers are now focused on shrinking the size of
existing models without losing their capabilities. Now two new papers,
released within a day of one another, successfully did that to the smaller
version of BERT, with 100 million parameters.
The first
paper from researchers at Huawei produces a model called TinyBERT
that is 7.5 times smaller and nearly 10 times faster than the original. It
also reaches nearly the same language understanding performance as the
original. The second from
researchers at Google produces another that’s more than 60 times smaller,
but its language understanding is slightly worse than the Huawei
version. Read more here.
|
|
For more on tiny natural language
models, try:
·
Huawei’s full paper, “TinyBERT: Distilling
BERT for Natural Language Understanding”
·
Google’s full paper, “Extreme Language
Model Compression”
· A Medium article that includes
a chart showing how models have grown over the last year and a half
|
|
|
|
|
https://mail.google.com/mail/u/0/#inbox/FMfcgxwDrbtlXNRPQVCHBtnkjKdGjfDx
Comments
Post a Comment