Towards a Distributed Search Engine

Ricardo Baeza-Yates

Distributed search engines are often more complex to implement compared to centralized engines. Distributing a search engine across multiple sites, however, has severaladvantages. In particular, it enables the utilization of less computer resources and the exploitation of data and user locality. In this presentation we show the feasibility of distributed Web search engines, by proposing a model for assessing the total cost of a distributed Web-search engine that includes the computational costs as well as the communication cost among all distributed sites. Using examples, we show that a distributed Web search engine can be more cost effective than a centralized one, if there is a large percentage of local queries, which is usually the case. We then present a query-processing algorithm that maximizes the amount of queries answered locally, without sacrificing the quality of the results, by using caching and partial replication. We simulate our algorithm on real document collections and real query workloads to measure the actual parameters needed for our cost model, and we show that a distributed search engine can be competitive compared to a centralized architecture with respect to cost.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: