Part-of-Speech Tagging of Portuguese Using Hidden Markov Models with Character Language Model Emissions

Marcelo Rodrigues de Holanda MaiaGeraldo Bonorino Xexéo

This paper presents a probabilistic approach for POS tagging that combines HMMs and character language models being applied to Portuguese texts. In this approach, the emission probabilities for each hidden state in a HMM are estimated by a proper character language model. The tagger built has been trained and tested on Bosque, a subset of Floresta Sintá(c)tica treebank, reaching 96.2% accuracy with a 39-tag tagset and 92.0% with a 257-tag tagset extended with inflexion information.

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato:
     Mantida por: