Generating a Pronunciation Dictionary for European Portuguese Using a Joint-Sequence Model with Embedded Stress Assignment

Arlindo VeigaSara CandeiasFernando Perdigão

This paper addresses the problem of grapheme to phoneme conversion in order to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The vocabulary was definedusing the CETEMPúblico corpus. The model and dictionary are publicly available.

