The CORALBRASIL corpus: methodological basis for treatment of spontaneous speech

Heliana R. MelloMaryualê M. MittmannTommaso Raso

This paper highlights the primary methods employed in the CORALBRASIL compiling process, i.e, recording, transcribing and segmenting oral texts. The CORALBRASIL is a Brazilian Portuguese corpus of spontaneous speech, designed for the study of informational structure. It is representative of the diaphasic variation, seeking to cover as many different comunicative situations as possible. This paper presents and exemplifies the processes of transcription and segmentation of speech into prosodic units as employed in our ongoing research. It concludes with illustrations of some questions that the corpus will enable us to answer.

