Development of an evaluation platform capable of validating wide-area distribution systems

Development of an evaluation platform capable of validating wide-area distribution systems

Training scenarios involving complicated, simultaneous disasters and interference to become possible

Nov 20, 2015

A Japanese research project developed "DESTCloud," an evaluation platform that validates the disaster tolerance and fault tolerance of wide-area distribution systems consisting of multiple computers on a network. This platform, which utilizes a wide-area virtualized environment comprised of multiple research institutes both inside Japan and overseas known as “distcloud,” can validate disaster tolerance and fault tolerance of the systems that operate in the virtualized environment by intentionally causing interference to the network that interconnects the organizations.

After the Great East Japan Earthquake, information systems that can achieve "disaster recovery" to recover from loss of information due to large-scale disasters as well as "business continuity plans" that enable quick continuation of services in the aftermath of disasters have been gathering attention in Japan. However, even though strengthening of facilities and formulation of procedures have been carried out, surveys have shown that the frequency that disaster exercises based on plans formulated using the constructed disaster recovery systems are implemented is low.

This evaluation platform assumes the occurrence of multiple, simultaneous disasters including earthquakes and communication failure due to damaged communication infrastructure, thereby making it possible to conduct training for various disaster scenarios. Through this training, it will become possible to preemptively evaluate whether or not these wide-area distribution systems will be able to continue providing their services even in times of large-scale disaster. Using functions provided by this evaluation platform will make it possible to quantitatively evaluate the disaster tolerance and fault tolerance of wide-area distribution systems such as the Internet.

These research results were presented at the international conference SC15 in Austin, Texas, USA, on November 15, 2015.

This project, led by KASHIWAZAKI Hiroki (Assistant Professor, Osaka University), NAKAGAWA Ikuo (Visiting Associate Professor, Osaka University), KITAGUCHI Yoshiaki (Assistant Professor, Kanazawa University), ICHIKAWA Kohei (Associate Professor, Nara Institute of Science and Technology), KONDO Tohru (Associate Professor, Hiroshima University), and KIKUCHI Yutaka (Professor, Kochi University of Technology), has pushed forward with research to apply parallel distributed storage technology, which handles storage units provided by multiple computers as a single storage unit, to disaster recovery methods.

In this research, the wide-area virtualized environment “distcloud” was constructed, with hubs located at Osaka University, Tohoku University, National Institute of Informatics, Kanazawa University, Kyoto University, Nara Institute of Science and Technology, Hiroshima University, Kochi University of Technology, the University of California, San Diego, as well as data centers in Sapporo and Okinawa. The aim of this research development is to verify the validity of plans for business continuation in times of disaster and quantitatively evaluate the deterioration of the quality of these systems in such a disaster by emulating a failure state of systems during a large-scale disaster in a virtualized environment, thereby improving wide-area systems.

Efforts have already been carried out to validate fault tolerance through loss of virtual machines within a single organization, but following a large-scale disaster, various forms of interference occur simultaneously over a wide area. In order to realize more realistic disaster training, this research project has focused on “Software Defined Network (SDN)” technology, which enables users to control network settings by changing software (i.e., programmable), and applied this SDN technology as “technology to destroy the network.” The motivation for research and development of the evaluation platform “DESTCloud” is to realize the emulation of more complex interference through the utilization of SDN to cause interference in networks connecting organizations and change the network topology of wide-area systems. Benchmark tests will be carried out until February 2016 in a wide variety of disaster environments through collaboration with corporations developing software for the project. Experimental results will be fed back to these corporations, which are expected to contribute to the improvement of product quality. This project will continue research and development towards the establishment of more practical technology by promoting international standardization regarding SDN control methods of this evaluation platform, establishing a consortium to increase social awareness on network disaster prevention, and increasing the number of hubs.

Abstract

ICT ( Information and Communication Technologies) environment increases its importance and is necessary for our daily lives and our businesses. It is important to ensure the toughness and robustness of ICT systems, and there are several solutions to this, such as redundant configurations, SPoF (single point of failure) less wide-area distribution, and so on. Nowadays in Japan, especially the western areas, we must design our ICT systems to endure beyond the assumption of disaster caused by a southeastern sea (Tounankai) earthquake. The DESTCloud research group designed and evaluated a platform to evaluate disaster tolerance of ICT systems by emulating existing disaster records.
To contribute to more practically making countermeasures against disaster and disorder, we propose classification from the point of view of  topology and risk evaluations.

Related Links

Technical Glossary