A Heuristic-based Hierarchical Clustering Method for Author Name Disambiguation in Digital Libraries

Ricardo G. CotaMarcos André GonçalvesAlberto H. F. Laender

In this paper, we propose a heuristic-based hierarchical clustering (HHC) method to deal with the name disambiguation problem. The method successively fuses clusters of citations of compatible authors based on several heuristic and similarity measures on the components of the citations (e.g., co- authors, title of the work, publication venue). In each phase, the information of fused clusters is aggregated providing more information for the next round of fusion. Experiments with a dataset taken from the DBLP collection show gains up to 12% against a previous method that did not consider hierarchical clustering and up to 21% against a supervised baseline (i.e., SVM) and 15.5% against an unsupervised one (i.e., K-Means) which use the same evidence considered.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: