Improvements in the Partitions Selection Strategy for Set of Clustering Solutions

**
Sakata, T.C.,
Faceli, K.,
de Souto, M.C.P.,
de Carvalho, A.C.P.L.F..
**

No clustering algorithm is guaranteed to find actual groups in any dataset. Thus, the selection of the most suitable clustering algorithm to be applied to a given dataset is not easy. To deal with this problem, one can apply various clustering algorithms to the dataset, generating a set of partitions (solutions). Next, one can choose the best partition generated, according to a given validation measure - such measures are usually biased towards one or more clustering algorithms. However, in many cases, it is interesting to have more than one solution. In a previous work, we proposed a selection strategy able to reduce the number of solutions obtained from Pareto-based multi-objective genetic algorithms. This selection strategy uses the correct Rand index to select a subset of the most different partitions. The size of the solutions' set is controlled by a threshold of the value of this index, given as an external parameter. The reduction of the threshold value decreases the number of solutions. Since the choice of such a threshold value is not intuitive, this paper describes a modification of the original selection algorithm that automatically adjusts this threshold and guarantees the selection of the most evident partitions, which was simultaneously obtained with distinct clustering criteria. The new version does not require any user settings, presents a better number of solutions and maintains the diversity of the partitions in the reduced set.

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5715212

Biblioteca Digital Brasileira de Computação - Contato: bdbcomp@lbd.dcc.ufmg.br