Implementando Recuperação por Retorno Baseada em Checkpointing em Sistemas Distribuídos Assíncronos

Clairton BuligonSérgio CechinIngrid Jansch-Pôrto

The rollback-recovery from previous checkpoints is largely employed as a fault-tolerant technique. The complexity of distributed system models has motivated the development of different algorithms, which aim at offering simpler and more efficient solutions than the preceding ones. In our Fault Tolerance Group, an algorithm has been recently proposed: it envisages asynchronous distributed systems based on message passing, operates with coordinated non-blocking checkpointing and ensures treatment for orphan and lost messages. This paper describes the algorithm implementation challenges, the decisions and our results until the present moment.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: