An A.I. System Passed an Eighth-Grade & High School Science Tests. Thinks like a Human...
An
A.I. System Passed an Eighth-Grade Science Test. Can You?
SAN
FRANCISCO — Four years ago, more than 700 computer scientists competed in a
contest to build artificial intelligence that could pass an eighth-grade
science test. There was $80,000 in prize money on the line.
They all flunked. Even the most sophisticated
system couldn’t do better than 60 percent on the test. A.I. couldn’t match the
language and logic skills that students are expected to have when they enter
high school.
But on Wednesday, the Allen Institute for
Artificial Intelligence, a prominent lab in Seattle, unveiled a new system that
passed the test with room to spare. It correctly answered more than 90 percent
of the questions on an eighth-grade science test and more than 80 percent on a
12th-grade exam.
Aristo was built solely for multiple-choice
tests. It took standard exams written for students in New York, though the
Allen Institute removed all questions that included pictures and diagrams.
Answering questions like that would have required additional skills that
combine language understanding and logic with so-called computer vision.
Some test questions, like this one from the
eighth-grade exam, required little more than information retrieval:
A group of tissues that work together to perform a specific
function is called:
(1) an organ
(2) an organism
(3) a system
(4) a cell
But others, like this question from the same
exam, required logic:
Which change would most likely cause a decrease in the number of
squirrels living in an area?
(1) a decrease in the number of predators
(2) a decrease in competition between the
squirrels
(3) an increase in available food
(4) an increase in the number of forest fires
Researchers at the Allen Institute started work
on Aristo — they wanted to build a “digital Aristotle” — in 2013, just after
the lab was founded by the Seattle billionaire and Microsoft co-founder Paul
Allen. They saw standardized science tests as a more meaningful alternative to
typical A.I. benchmarks, which relied on games like chess and backgammon or
tasks created solely for machines.
A science test isn’t something that can be
mastered just by learning rules. It requires making connections using logic. An
increase in forest fires, for example, could kill squirrels or decrease the
food supply needed for them to thrive and reproduce.
“We can’t compare this technology to real
human students and their ability to reason,” said Jingjing Liu, a Microsoft
researcher who has been working on many of the same technologies as the Allen
Institute.
But Aristo’s advances could spread to a range
of products and services, from internet search engines to record-keeping
systems at hospitals.
“This
has significant business consequences,” said Oren Etzioni, the former
University of Washington professor who oversees the Allen Institute. “What I
can say — with complete confidence — is you are going to see a whole new
generation of products, some from start-ups, some from the big companies.”
The new research could lead to systems that
can carry on a decent conversation. But it could also encourage the spread of false
information.
“We are at the very early stage of this,”
said Jeremy Howard, who oversees Fast.ai, another influential lab, in San
Francisco. “We are so far away from the potential that I cannot say where it
will end up.”
Dr. Etzioni’s excitement, however, was muted.
Artificial intelligence was not nearly as advanced as it might seem, he said,
pointing to the earlier Allen Institute’s competition that stumped the A.I.
systems with an eighth-grade science test.
The Allen Institute improved on that earlier
effort much quicker than many experts — including Dr. Etzioni — expected.
Its work was largely driven by neural networks,
complex mathematical systems that can learn tasks by analyzing vast amounts of
data. By pinpointing patterns in thousands of dog photos, for example, a neural
network can learn to recognize a dog.
In recent months, the world’s leading A.I.
labs have built elaborate neural networks that can learn the vagaries of
language by analyzing articles and books written by humans.
At Google, researchers built a system called
Bert that combed through thousands of Wikipedia articles and a vast digital
library of romance novels, science fiction and other self-published books.
Through analyzing all that text, Bert learned
how to guess the missing word in a sentence. By learning that one skill, Bert
soaked up enormous amounts of information about the fundamental ways language
is constructed. And researchers could apply that knowledge to other tasks.
Not long ago, researchers at the lab defined the
behavior of their test-taking system one line of software code at a time.
Sometimes they still do that painstaking coding. But now that the system can
learn from digital data on its own, it can improve at a much faster rate.
Systems like Bert — called “language models”
— now drive a wide range of research projects, including conversational systems
and tools designed to identify false news. With more data and more computing
power researchers believe the technology will continue to improve.
But Dr. Etzioni stressed that the future of
these systems was hard to predict and that language was only one piece of the
puzzle.
Ms. Liu and her fellow Microsoft researchers
have tried to build a system that can pass the Graduate Records Exam, the test
required for admission to graduate school.
The language section was doable, she said,
but building the reasoning skills required for the math section was another
matter. “It was far too challenging.”
Comments
Post a Comment