Projeto e Implementação de um Serviço de Detecção de Falhas Perfeito

Ely W. de OliveiraAndrey E. M. BritoFrancisco V. Brasileiro

An unreliable failure detector is an important abstraction to support the implementation of fault-tolerant protocols on distributed asynchronous systems. Several classes of failure detectors, with varying semantics, have been proposed. The class of perfect failure detectors provides the strongest semantics. In this paper we present the design and implementation of a failure detection service with perfect semantics. The service is implemented at the operating system level by adding extra system calls to a standard Linux kernel. We have also implemented both C and Java APIs to provide applications with access to the service. Special low-cost wireless communication devices have been built to support the service. These devices are connected to each machine running the service in a local area network. They create an ad-hoc network that is used solely to convey messages of the failure detection service.

