Rodrigo R. Ormonde, Marcelo Ladeira.
The machinelearning approach to websites classification belongs to the class of multilabel problems, i.e., a single document can be labeled with more than one category, which is the harder and less studied class. This article proposes a new algorithm, based on the Minimum Description Length principle and on the Adaptive Huffman coding, which can be used to perform multilabel classification of textual documents in general, with or without closed world assumption. This allows documents to be labeled, with one, several or no category. The results show the potential of this novel algorithm.
http://www.lbd.dcc.ufmg.br:8080/colecoes/waamd/2009/009.pdf
Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web