BDBComp
Parceria:
SBC
Detecção de Sitios Replicados Utilizando Conteúdo e Estrutura

André Luiz da Costa CarvalhoAllan José de Souza BezerraEdleno Silva de MouraAltigran Soares da SilvaPatrícia Silva Peres

Identifying replicated sites is an important task for search engines.It can reduce data storage costs, improve query processing time and remove noises that might affect the quality of the final answer given to the user . This paper introduces a new approach to detect replicated sites in search engines databases, using as replication evidences the websites' structure and the content of their pages. It is also depicted the result of experiments performed with a real search engine database. Our approach found 8.43% of the web pages stored in the database were in replicated web sites with 94.4% precision, result witch is more accurate than the ones found in other works.

http://www.lbd.dcc.ufmg.br:8080/colecoes/sbbd/2005/002.pdf

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato: bdbcomp@lbd.dcc.ufmg.br
     Mantida por:
LBD