BDBComp
Parceria:
SBC
Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora

Wilker AzizLucia Specia

This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at document- and sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 aligned sentences each. The quality of the corpora and their usefulness are tested in an experiment with machine translation.

http://www.lbd.dcc.ufmg.br/colecoes/stil/2011/0033.pdf

Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web

Biblioteca Digital Brasileira de Computação - Contato: bdbcomp@lbd.dcc.ufmg.br
     Mantida por:
LBD