Portuguese Corpus-Based Learning Using ETL

Ruy Luiz MilidiúCícero Nogueira dos SantosJulio Cesar Duarte

We present Entropy Guided Transformation Learningmodels for three Portuguese Language Processing tasks:Part-of-Speech Tagging, Noun Phrase Chunking and NamedEntity Recognition. For Part-of-Speech Tagging, we separatelyuse the Mac-Morpho Corpus and the Tycho Brahe Corpus. ForNoun Phrase Chunking, we use the SNR-CLIC Corpus. ForNamed Entity Recognition, we separately use three corpora:HAREM, MiniHAREM and LearnNEC06.For each one of the tasks, the ETL modeling phase isquick and simple. ETL only requires the training set and nohandcrafted templates. ETL also simplifies the incorporationof new input features, such as capitalization information,which are sucessfully used in the ETL based systems. Usingthe ETL approach, we obtain state-of-the-art competitiveperformance in all six corpora-based tasks. These resultsindicate that ETL is a suitable approach for the constructionof Portuguese corpus-based systems.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: