Automatic induction of translation lexicons from aligned parallel corpus

Helena de M. CaseliMaria das Graças V. Nunes

Translation lexicons are one of the most important linguistic resources for machine translation. However, this bilingual set of word and multiword correspondences requires a lot of manual work to be built. This paper describes a method to automatically build translation lexicons by extractingknowledge from PoS-tagged and lexically aligned parallel corpora. Preliminary experiments were carried out on Brazilian Portuguese, Spanish and English parallel texts. The results showed that 85% of pt-es and 89% of pt-en entries are plausible correspondences. These results were obtained taking intoconsideration only the classes of entries which achieved the best results.

