TwoStep Cluster: Análise Comparativa do Algoritmo e Proposta de Melhoramento da Medida de Verossimilhança

Rafael Leão BrazãoPedro A. BarbettaDalton F. Andrade

In this paper we made comparisons of the algorithm TwoStep Cluster (TSC) with other clustering algorithms for large databases. The comparisons were made using simulated data, whose variations of the parameters were made in according to the Design of Experiments methodology. The results showed that TSC had better accuracy when the clusters had different variances; however it was shown to be slower than the traditional K-means algorithm. In this work we also propose an improvement in the measure of the log-likelihood considered in the algorithm. It allows incorporating information about the correlations between the variables.

