This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at document- and sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 aligned sentences each. The quality of the corpora and their usefulness are tested in an experiment with machine translation.
http://www.lbd.dcc.ufmg.br/colecoes/stil/2011/0033.pdf
Caso o link acima esteja inválido, faça uma busca pelo texto completo na Web: Buscar na Web