Maria Lucía del Rosario Castro Jorge, Verônica Agostini, Thiago Alexandre Salgueiro Pardo.
Multi-document summarization consists in automatically producing a unique informative summary from a collection of texts on the same topic. In this paper we model the multi-document summarization task as a problem of machine learning classification where sentences from the source texts have to be classified as belonging or not to the summary. For this aim, we combine superficial (e.g., sentence position in the text) and deep linguistic features (e.g. semantic relations across documents). In particular, the linguistic features are given by CST (Cross-document Structure Theory). We conduct our experiments on a CST-annotated corpus of news texts. Results show that linguistic features help to produce a better classification model, producing state-of- the-art results.
http://www.lbd.dcc.ufmg.br/colecoes/enia/2011/007.pdf
Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web