IBM's 120 petabyte drive could help better predict weather

Massive drive would store up to 1 trillion files (video below)

Lucas Mearian

August 30, 2011 (Computerworld)
The development of the world's largest single-file name data repository could help predict weather and prevent overhyping of hurricanes like Irene.
Forecasters had predicted Irene could devastate cities such as Washington and New York, but instead some of the most severe damage occurred far further inland in states such as Vermont, which was drowned in tropical-storm downpours.
Several post-Hurricane Irene reports pointed to inaccurate forecasts as problematic. As the UK publication, The Guardian, wrote: The "storm surge that could have swamped [Manhattan] failed to materialize." And many New Yorkers were unhappy about having prepared for the worst only to experience little to no damage.
Enter IBM's Data Storage Group at Almaden, Calif., which has proved it can build a 120PB data system by using 200,000 SAS (serial SCSI) drives -- all configured as if it is a single drive under one name. That's roughly 30 times larger than the biggest single data repository on record, according to IBM. The system could store up to 1 trillion files. Even the Wayback Machine, a massive data time capsule created by The Internet Archive to store everything on the Web since 1996, only holds 2PB of data.
IBM said it chose high-performance SAS drives over high-capacity SATA drives because the system has high bandwidth requirements. The drives are also connected via a backbone that uses the SAS (serial SCSI) protocol, but the storage is connected to compute nodes via a proprietary fabric, which IBM would not disclose.
The technology for IBM's massive data store, which the company plans to begin installing in several customer sites later this year, would be ideal for creating more powerful high-performance computing systems that perform tasks such as climate modeling.
To be sure, Hurricane Irene packed plenty of punch. At least 21 people in eight states died as a result of the storm. And early estimates for damage top $7 billion. But most models showed the storm hitting the East Coast with far more force than it did.
"As with any of these high-performance computing simulations ... the more variables you can look at, the more granular you can be, the better the models. Hopefully, the better the model, the better the prediction," said Bruce Hillsberg, director of Storage Systems Research at IBM. While IBM used the weather simulation as an example, it would not say who its customers were for the data store.
IBM's 120PB data store has yet to be built. The company will be assembling it in the data centers of several customers over the next year, but the base technology to build the systems has been around for many years. The technology, IBM's General Parallel File System (GPFS), is already used in a number of IBM products, including its scale-out NAS (SONAS) array, which IBM brought to market last year, and can scale to 14PB of capacity. IBM also uses GPFS in its strategic archive product called the IBM Information Archive, as well as its cloud storage service offerings.
IBM has been using GPFS to build massive data stores since 1998. Back then, the largest single virtual drive was 43TB, a capacity that's easily achieved in a single data center rack today.
In fact, IBM's GPFS technology was the data store behind IBM's Watson supercomputer, which earlier this year demonstrated its processing prowess by handily beating champions of the game show Jeopardy. That system boasted a 21.6TB data store.
It was for that very reason, the massive growth in customer data storage requirements, that IBM built its latest GPFS storage system.
"We really think that cloud computing and cloud storage could get to these capacity points in coming years. So this research allows us to be ready for that when the market needs it," Hillsberg said.

Challenges with scale-out

While GPFS has been around for years, building a 120PB drive had its challenges, the greatest of which was data integrity, Hillsberg said.
"With 200,000 drives, there are going to be drives failing all the time. So you have to think about it not in terms of trying to improve the failure rates of individual drives, but look at the system as a whole and meantime to data loss," he said, referring to referring to how long the data store will last before it might begin losing information. "So how do you keep the system up and running when you have lots and lots of individual components failing?"
Hillsberg and his team looked at current technologies, such as RAID 6, or dual-drive parity, which offered a meantime to data loss of about 56 years, but it was still too high a probability.
Without giving specifics on the "secret sauce," Hillsberg said his team was able to come up with another scheme that offered up to 1 million years between data loss events.
"It has to do with keeping more copies of data than you would in traditional RAID systems as well as algorithms to recover it. We also have a lot of optimization in there to deal with the rate of recovery to keep it efficient," he said.
In supercomputers, systems typically are limited by the rate at which they can pull data from the storage subsystem. A RAID rebuild can use an enormous amount of CPU capacity, thereby affecting the overall performance of the system.
IBM basically created an algorithm that examines disk failure rates and rebuilds data at different rates depending on how many drives have failed and how many copies of data are available. For example, the systems would react more slowly and use fewer CPU cycles to rebuild a single disk failure than multiple ones.
"If you're seeing one failure with one set of data, you can do that rebuild relatively slowly because we have the data redundancy," he explained. "And, if you have two failures in the same data space, you go faster. If you have three failures, then you go really fast," he said.
Another data center issue the GPFS and data resiliency technology addresses is one affecting the network-attached storage market as a whole: one NAS file server is easy to manage, but 100 NAS arrays aren't.
"We've learned how to build systems that scale out in terms of performance and capacity, but we've been able to keep the management costs flat," Hillsburg said. "We do that through a single name space and single point of management."
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at Twitter @lucasmearianor subscribe to Lucas's RSS feed Mearian RSS. His e-mail address is


Popular posts from this blog

Report: World’s 1st remote brain surgery via 5G network performed in China

Visualizing The Power Of The World's Supercomputers

BMW traps alleged thief by remotely locking him in car