MIT researchers can listen to your conversation by watching your potato chip bag
MIT researchers can listen to your conversation by
watching your potato chip bag
By Rachel Feltman August 4 at 1:59 PM
Imagine someone listening in to your private conversation
by filming the bag of chips sitting on the other side of the room. Oddly
specific, I know, but researchers at MIT did just that: They’ve created an
algorithm that can reconstruct sound (and even intelligible speech) with the
tiny vibrations it causes on video.
When sound hits an object, it makes distinct vibrations.
“There’s this very subtle signal that’s telling you what the sound passing
through is,” said Abe Davis, a graduate student in electrical engineering and
computer science at MIT and first author on the paper. But the movement is tiny
– sometimes as small as thousandths of a pixel on video. It’s only when all of
these signals are averaged, Davis said, that you can extract sound that makes
sense. By observing the entire object, you can filter out the noise.
This particular study grew out of an earlier experiment
at MIT, led by Michael Rubinstein, now a postdoctoral researcher at Microsoft
Research New England. In 2012, Rubinstein amplified tiny variations in video to
detect things like the skin color change caused by the pumping of blood.
Studying the vibrations caused by sound was a logical next step. But getting
intelligible speech out of the analysis was surprising, Davis said.
The results are certainly impressive (and a little
scary). In one example shown in a compilation video, a bag of chips is filmed
from 15 feet away, through sound-proof glass. The reconstructed audio of
someone reciting “Mary Had a Little Lamb” in the same room as the chips isn’t
crystal clear. But the words being said are possible to decipher.
In most cases, a high-speed camera is necessary to
accomplish the feat. Still, at 2,000 to 6,000 frames per second, the camera
used by the researchers is nothing compared to the best available on the
market, which can surpass 100,000 frames per second. And the researchers found
that even cheaper cameras could be used.
“It’s surprisingly possible to take advantage of a bug
called rolling shutter,” Davis said. “Usually, it creates these artifacts in
the image that people don’t like.” When cameras use rolling shutter to capture
an image, they don’t capture one single point in time. Instead, the camera
scans across the frame in one direction, picking up each row at a slightly
different moment.
By doing so, the camera happens to encode information at
a much higher rate than its actual frame rate. For the researchers, that meant
being able to analyze vibrations that should have happened too quickly for
capture on film. “It kind of turns a two-dimensional low-speed camera into a
one-dimensional high-speed camera,” Davis explained. “As a result, we can
recover sounds happening at frequencies several times higher than the frame
rate of the camera, which is remarkable when you consider that it’s just a
complete accident of the way we make them.”
There are definitely limitations to the technology, Davis
said, and it may not make for better sound reconstruction than other methods
already in use. “Big brother won’t be able to hear anything that anyone ever
says all of a sudden,” Davis said. “But it is possible that you could use this
to discover sound in situations where you couldn’t before. It’s just adding one
more tool for those forensic applications.”
Davis and his colleagues care more about applications in
scientific research. “This is a new dimension to how you can image objects,” he
said. “It tells you something about how they respond physically to pressure,
but instead of poking and prodding at them, all you need is to play sound at
them.”
Comments
Post a Comment