7. Benchmark Platform for Failure Detectors | NYU Shanghai Undergraduate Research Symposium

Audio - Ruhao Xin & Juncheng Dong.mp3

Project Description

One of the most typical and important problems in distributed systems is the failure detectors. This seems simple, but actually in an asynchronous distributed system, it is impossible to deterministically judge a state of a process because we can hardly tell the differences between two states: the state that a process is crashing, and the state that a process is just very slow. As a result, Chandra and Toueg introduced a model called unreliable failure detectors, proved that consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes. Moreover, they define two properties to evaluate the performance of a failure detector: completeness and accuracy. Based on this, more and more papers started to build failure detectors. For example, Bertier, Marin and Sens proposed an implementation of failure detector, which is adaptable and can support scalable applications. Later on, they integrated this with a hierarchical network structure to present a Hierarchical failure detector, which allows to decrease the number of messages and the processor load. Based on this, another paper improved the initial models of failure detectors by providing a probabilistic evaluation about suspicion rather than deterministic evaluation.

However, there are some problems about designing failure detectors. Firstly, we notice that when trying to evaluate the performance of their failure detector, they are required to use some practical networks (or simulate some practical networks). We think that it is unnecessary for researchers to build a complete and complex network structure for evaluations because in failure detectors, the behaviour of processes is much more important than the entire network flow. Secondly, the current comparison processes of different failure detectors are not very efficient because different failure detectors are designed under different network environments, which makes the comparison not so objective. As a result, it can be more convenient for researchers to build, test and compare failure detectors if a platform specialized in failure detectors exists. That’s the reason why we plan to design and implement a benchmark platform for failure detectors in this paper.

Github Project link: https://github.com/BenchamrkingPlatformforFailuredetector/Benchmark_Platform_for_Failure_Detector