Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning

Erick Galani MazieroThiago Alexandre Salgueiro Pardo

Multi-document handling is essential today, when many documents on the same topic are produced, especially considering the Web. Both readers and computer applications can benefit from a discourse analysis of this multidocument content, since it demonstrates clearly the relations among portions of these documents. This work aims to identify such relations automatically using machine learning techniques. Particularly, this work focuses on the identification of relations predicted by the Cross-document Structure Theory (CST). The obtained results improve the state of the art.

