Quotation Extraction for Portuguese

William Paulo Ducca FernandesEduardo MottaRuy Luiz Milidiú

Quotation extraction consists of identifying quotations and their authors. In this work, we present a Quotation Extraction system for Portuguese that is based on Entropy Guided Transformation Learning, a supervised Machine Learning algorithm. This is the first system that uses a Machine Learning approach for Portuguese. In order to train and evaluate the proposed system, we build the GLOBOQUOTES corpus, with news extracted from the GLOBO.COM portal. Our system obtains a score of 79.02% for the subtask of associating a quotation to its author. For the whole Quotation Extraction task, the observed score value is 66.03%. These findings indicate that the overall extraction quality is highly dependant on the quotation identification subtask.

