Gerência de Falhas Distribuída e Confiável baseada em Clusters de Agentes

Aldri L. dos SantosElias P. Duarte Jr.Glenn Mansfield

Most network monitoring systems only allow the examination of managed objects of fault-free agents. However it is often useful to examine MIB objects of a crashed or unreachable network element in order to determine why it is faulty. This work presents a new clustering architecture for SNMP agents that supports semi-active replication of managed objects. A cluster of agents provides fault-tolerant object functionality: replicated managed objects of a faulty agent of a given cluster may be accessed through a peer cluster. Furthermore, the cluster behaves as a cache of managed objects that reduces the impact of monitoring on network performance. The proposed architecture is a structured in three layers. The lower layer corresponds to the managed objects at the network elements. The middle layer contains management entities called clusters that monitor and replicate managed objects. The upper layer allows the definition of management clusters as well as the relationship between clusters. A practical tool was implemented and is presented, as an application example we show how it was used to determine the occurrence of TCP SYN-Flooding attacks. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.

