Solving the Storage Bottleneck

Supercomputers bring together the computing capacity and speed of tens of thousands of computing nodes. Each of these computing nodes needs to access stored data to function. Reading data from storage, an inherently slower process than computing, can be a serious bottleneck inhibiting the efficiency of a supercomputer. Samer Al-Kiswany is a Ph.D. student working with Matei Ripeanu in the Computer and Software Engineering Research Group. Samer designs large-scale storage systems that significantly increase the efficiency of storage without altering the existing hardware platform or software interfaces.

Easing the storage bottleneck is particularly important for research areas that generate large datasets.  In areas such as high-energy physics or bioinformatics datasets can be measured in terabyts. Samer has tested his work in the field of bioinformatics, increasing the speed of applications manipulating huge datasets of DNA and protein sequences.

Samer, along with other members of the research team at UBC, developed Mosaic Storage (MosaStore) an aggregate storage system of small storage spaces, together building a large storage space. MosaStore improves the efficiency of an existing storage system in two ways. First, it aggregates the storage space that exists in the system as part of the compute nodes, exploiting resources that are already available in the infrastructure more efficiently. The storage system scavenges unused storage from network-connected machines. MosaStore is also designed to support optimizing the storage system for a particular application. By analyzing the storage needs of an application storage can be configured to best serve the required purpose. For example, an application may produce data in a file that, later in the sequence will be used by many compute nodes. If this file will need to be read by 100 computers it is a kind of hot spot. In this instance MosaStore would replicate the data file so each computing node can access the file quickly. The configuration of the storage is designed to be straightforward and intuitive. Someone who is familiar with using an application could complete the configuration in a matter of minutes.

The prototype of MosaStore has been tested at Argon National Laboratories with a bioinformatics application running on a computer using 90,000 computing nodes. The scientists at Argon National Laboratories found that the prototype outperformed their current system. Argon National Laboratories continues to work with the prototype collaborating with UBC in optimizing the storage system.  Through this collaboration Samer gets valuable feedback to advance his work.

MosaStore does not require any changes in the software application or changes in the infrastructure or hardware making it an attractive solution for companies  and research institutes that have already made huge investments in both their applications and hardware. It has been clear from the very beginning of the project that institutes working in a large scale data environment would not be willing to make changes to their software application and would be reluctant to change hardware as well. Samer had to respond to these constraints as the design for MosaStore developed. With this solution all that is required is the user to give a hint about what the application will do, then the system will auto-configure itself.

As computing functions get faster and faster there will continue to be storage challenges. Disks are mechanical devises that need to rotate and mechanical devices are always orders of magnitude slower than circuitry. There may always be a gap between the speed of reading data and the speed of processing that will continue to drive research in this area.