Marco Túlio C. Ribeiro, Leonardo Vilela Teixeira, Pedro H. Calais Guerra, Adriano Veloso, Wagner Meira Jr., Dorgival Guedes, Cristine Hoepers, Klaus Steding-Jessen, Marcelo H. P. C. Chaves.
In this paper we propose a strategy of spam classification that exploitsthe content of the Web pages linked by e-mail messages. We describe a methodologyfor extracting pages linked by spam and we characterize the relationshipamong those pages and the spam messages. We then use a machine learningalgorithm to extract features found in the web pages that are relevant to spamdetection. We demonstrate that the use information from linked pages can significantlyoutperforms current spam classification techniques, as portrayed bySpam Assassin. Our study shows that the pages linked by spams are a very promisingbattleground, where spammers do not hide their identity, and that thisbattleground has not yet been used by spam filters.
http://www.lbd.dcc.ufmg.br/colecoes/sbrc/2011/0032.pdf
Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web