Engineers translate brain signals directly into speech
Engineers translate brain signals directly into speech
Advance marks critical step toward brain-computer interfaces that hold
immense promise for those with limited or no ability to speak
Summary: In a scientific first, neuroengineers have created a system that
translates thought into intelligible, recognizable speech. This breakthrough,
which harnesses the power of speech synthesizers and artificial intelligence,
could lead to new ways for computers to communicate directly with the brain.
January 29, 2019 The Zuckerman
Institute at Columbia University
In a scientific first,
Columbia neuroengineers have created a system that translates thought into
intelligible, recognizable speech. By monitoring someone's brain activity, the
technology can reconstruct the words a person hears with unprecedented clarity.
This breakthrough, which harnesses the power of speech synthesizers and
artificial intelligence, could lead to new ways for computers to communicate
directly with the brain. It also lays the groundwork for helping people who
cannot speak, such as those living with as amyotrophic lateral sclerosis (ALS)
or recovering from stroke, regain their ability to communicate with the outside
world.
These
findings were published today in Scientific
Reports.
"Our
voices help connect us to our friends, family and the world around us, which is
why losing the power of one's voice due to injury or disease is so
devastating," said Nima Mesgarani, PhD, the paper's senior author and a
principal investigator at Columbia University's Mortimer B. Zuckerman Mind
Brain Behavior Institute. "With today's study, we have a potential way to
restore that power. We've shown that, with the right technology, these people's
thoughts could be decoded and understood by any listener."
Decades of
research has shown that when people speak -- or even imagine speaking --
telltale patterns of activity appear in their brain. Distinct (but
recognizable) pattern of signals also emerge when we listen to someone speak,
or imagine listening. Experts, trying to record and decode these patterns, see
a future in which thoughts need not remain hidden inside the brain -- but
instead could be translated into verbal speech at will.
But
accomplishing this feat has proven challenging. Early efforts to decode brain
signals by Dr. Mesgarani and others focused on simple computer models that
analyzed spectrograms, which are visual representations of sound frequencies.
But because
this approach has failed to produce anything resembling intelligible speech,
Dr. Mesgarani's team turned instead to a vocoder, a computer algorithm that can
synthesize speech after being trained on recordings of people talking.
"This
is the same technology used by Amazon Echo and Apple Siri to give verbal
responses to our questions," said Dr. Mesgarani, who is also an associate
professor of electrical engineering at Columbia's Fu Foundation School of
Engineering and Applied Science.
To teach
the vocoder to interpret to brain activity, Dr. Mesgarani teamed up with Ashesh
Dinesh Mehta, MD, PhD, a neurosurgeon at Northwell Health Physician Partners Neuroscience
Institute and co-author of today's paper. Dr. Mehta treats epilepsy patients,
some of whom must undergo regular surgeries.
"Working
with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to
listen to sentences spoken by different people, while we measured patterns of
brain activity," said Dr. Mesgarani. "These neural patterns trained
the vocoder."
Next, the
researchers asked those same patients to listen to speakers reciting digits
between 0 to 9, while recording brain signals that could then be run through
the vocoder. The sound produced by the vocoder in response to those signals was
analyzed and cleaned up by neural networks, a type of artificial intelligence
that mimics the structure of neurons in the biological brain.
The end
result was a robotic-sounding voice reciting a sequence of numbers. To test the
accuracy of the recording, Dr. Mesgarani and his team tasked individuals to
listen to the recording and report what they heard.
"We
found that people could understand and repeat the sounds about 75% of the time,
which is well above and beyond any previous attempts," said Dr. Mesgarani.
The improvement in intelligibility was especially evident when comparing the
new recordings to the earlier, spectrogram-based attempts. "The sensitive
vocoder and powerful neural networks represented the sounds the patients had
originally listened to with surprising accuracy."
Dr.
Mesgarani and his team plan to test more complicated words and sentences next,
and they want to run the same tests on brain signals emitted when a person
speaks or imagines speaking. Ultimately, they hope their system could be part
of an implant, similar to those worn by some epilepsy patients, that translates
the wearer's thoughts directly into words.
"In
this scenario, if the wearer thinks 'I need a glass of water,' our system could
take the brain signals generated by that thought, and turn them into
synthesized, verbal speech," said Dr. Mesgarani. "This would be a
game changer. It would give anyone who has lost their ability to speak, whether
through injury or disease, the renewed chance to connect to the world around
them."
Story
Source:
Materials
provided by The Zuckerman Institute at Columbia University. Note: Content may be edited for style and
length.
Journal
Reference:
1.
Hassan Akbari, Bahar Khalighinejad, Jose L. Herrero, Ashesh D.
Mehta, Nima Mesgarani. Towards reconstructing intelligible speech from
the human auditory cortex. Scientific
Reports, 2019; 9 (1) DOI: 10.1038/s41598-018-37359-z
Comments
Post a Comment