Remoção de Ambiguidades na Identificação de Autoria de Objetos Bibliográficos

Jean W. A. OliveiraAlberto H. F. LaenderMarcos André Gonçalves

Digital Libraries collect together digital content and metadata, frequently obtained from several disparate sources. The non-standardization of these sources brings such problems as ambiguous metadata fields. In this paper, we present a strategy for name authority disambiguation in digital libraries. Our strategy uses pattern matching functions and information retrieval techniques along with a clustering algorithm which allows for the creation of unified indexes that register the several variants of an author name appearing in the collection. We demonstrate the effectiveness of our strategy through exhaustive experimentation in two test collections with distinctive features, derived from two digital libraries: BDBComp – Biblioteca Digital Brasileira de Computação and DBLP - Digital Bibliography Library Project. For the collection derived from BDBComp, the average between the measure for the quality of the generated clusters and their fragmentation was higher than 95% while for the collection derived from DBLP that average was higher than 66%.

