Rafael Leão Brazão, Pedro A. Barbetta, Dalton F. Andrade.
In this paper we made comparisons of the algorithm TwoStep Cluster (TSC) with other clustering algorithms for large databases. The comparisons were made using simulated data, whose variations of the parameters were made in according to the Design of Experiments methodology. The results showed that TSC had better accuracy when the clusters had different variances; however it was shown to be slower than the traditional K-means algorithm. In this work we also propose an improvement in the measure of the log-likelihood considered in the algorithm. It allows incorporating information about the correlations between the variables.
http://www.lbd.dcc.ufmg.br:8080/colecoes/waamd/2007/002.pdf
Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web