Activist data mining for computational science: tools and applications

Dennis Shasha

Classical data mining involves: waiting for data to appear and then mining it. Activist data mining involves: proposing experiments based on algorithmic and application-specific considerations, evaluating the results, proposing new experiments, evaluating, proposing, and so on. Thus Activist Data Mining is a fundamentally interventionist and iterative endeavour. It entails close collaboration with application specialists. The techniques required include combinatorial design to support a disciplined experimental design, a variety of analog circuit-building techniques, and hypothesis-generation. The talk and this paper discusses these tools in the context of a series of case study collaborations with biologists and physicists. The necessary scientific background will be presented to make the discussion self-contained. The talk is meant to appeal to researchers and practitioners in data mining as well as any visiting natural scientists. The data sizes range from 30,000 items for microarrays to trillions of items in gamma ray experiments. My intent in this paper is to convey the philosophy of my appraoch. You can find the technicalities on my web page or on the conference site. I will concentrate on biology because that is where I do the most work.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: