Google Wants to Store Your Genome
Google Wants to Store Your Genome
For $25 a year, Google will keep a copy of any genome in
the cloud.
By Antonio Regalado on November 6, 2014
Google is approaching hospitals and universities with a
new pitch. Have genomes? Store them with us.
The search giant’s first product for the DNA age is
Google Genomics, a cloud computing service that it launched last March but went
mostly unnoticed amid a barrage of high profile R&D announcements from
Google, like one late last month about a far-fetched plan to battle cancer with
nanoparticles.
Google Genomics could prove more significant than any of
these moonshots. Connecting and comparing genomes by the thousands, and soon by
the millions, is what’s going to propel medical discoveries for the next
decade. The question of who will store the data is already a point of growing
competition between Amazon, Google, IBM, and Microsoft.
Google began work on Google Genomics 18 months ago,
meeting with scientists and building an interface, or API, that lets them move
DNA data into its server farms and do experiments there using the same database
technology that indexes the Web and tracks billions of Internet users.
“We saw biologists moving from studying one genome at a
time to studying millions,” says David Glazer, the software engineer who led
the effort and was previously head of platform engineering for Google+, the
social network. “The opportunity is how to apply breakthroughs in data
technology to help with this transition.”
Some scientists scoff that genome data remains too
complex for Google to help with. But others see a big shift coming. When Atul
Butte, a bioinformatics expert at Stanford heard Google present its plans this
year, he remarked that he now understood “how travel agents felt when they saw
Expedia.”
The explosion of data is happening as labs adopt new,
even faster equipment for decoding DNA. For instance, the Broad Institute in
Cambridge, Massachusetts, said that during the month of October it decoded the
equivalent of one human genome every 32 minutes. That translated to about 200
terabytes of raw data.
This flow of data is smaller than what is routinely
handled by large Internet companies (over two months, Broad will produce the equivalent
of what gets uploaded to YouTube in one day) but it exceeds anything biologists
have dealt with. That’s now prompting a wide effort to store and access data at
central locations, often commercial ones. The National Cancer Institute said
last month that it would pay $19 million to move copies of the 2.6 petabyte
Cancer Genome Atlas into the cloud. Copies of the data, from several thousand
cancer patients, will reside both at Google Genomics and in Amazon’s data
centers.
The idea is to create “cancer genome clouds” where
scientists can share information and quickly run virtual experiments as easily
as a Web search, says Sheila Reynolds, a research scientist at the Institute
for Systems Biology in Seattle. “Not everyone has the ability to download a
petabyte of data, or has the computing power to work on it,” she says.
Also speeding the move of DNA data to the cloud has been
a yearlong price war between Google and Amazon. Google says it now charges
about $25 a year to store a genome, and more to do computations on it.
Scientific raw data representing a single person’s genome is about 100
gigabytes in size, although a polished version of a person’s genetic code is
far smaller, less than a gigabyte. That would cost only $0.25 cents a year.
Cloud storage is giving a boost to startups like Tute
Genomics, DNANexus, Seven Bridges, and NextCode Health. These companies build
“browsers” that hospitals and scientists can use to explore genetic data.
“Google or Amazon is a back end. They are saying, ‘Hey, you can build a
genomics company in our cloud,’” says Deniz Kural, CEO of Seven Bridges, which
stores genome data on behalf of 1,600 researchers in Amazon’s cloud.
The bigger point, he says, is that medicine will soon
rely on a kind of global Internet-of-DNA which doctors will be able to search.
“Our bird’s eye view is that if I were to get lung cancer in the future,
doctors are going to sequence my genome and my tumor’s genome, and then query
them against a database of 50 million other genomes,” he says. “The result will
be ‘Hey, here’s the drug that will work best for you.’ ”
At Google, Glazer says he began working on Google
Genomics as it became clear that biology was going to move from “artisanal to
factory-scale data production.” He started by teaching himself genetics, taking
an online class, Introduction to Biology, taught by Broad’s chief, Eric Lander.
He also got his genome sequenced and put it on Google’s cloud.
Glazer wouldn’t say how large Google Genomics is or how
many customers it has now, but at least 3,500 genomes from public projects are
already stored on Google’s servers. He also says there’s no link, as of yet,
between Google’s cloud and its more speculative efforts in health care, like
the company Google started this year, called Calico, to investigate how to
extend human lifespans. “What connects them is just a growing realization that
technology can advance the state of the art in life sciences,” says Glazer.
Somalee Datta, a physicist who manages Stanford
University’s largest computer cluster for genetics data, says that because of
recent price cuts, it now costs about the same to store genomes with Google or
Amazon as in her own data center. “Prices are finally becoming reasonable, and
we think they will keep dropping,” she says.
Datta says some Stanford scientists have started using a
Google database system, BigQuery, that Glazer’s team made compatible with
genome data. It was developed to analyze large databases of spam, web
documents, or of consumer purchases. But it can also quickly perform the very
large experiments comparing thousands, or tens of thousands, of people’s
genomes that researchers want to try. “Sometimes they want to do crazy things,
and you need scale to do that,” says Datta. “It can handle the scale genetics can
bring, so it’s the right technology for a new problem.”
Comments
Post a Comment