Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts

Horacio SaggionSandra Szasz

We have created a human-annotated, multi-event, cross-lingual corpus of equivalent summaries in Spanish and English to investigate cross-lingual information extraction. The corpus contains, in addition to pairs of equivalent non-translated summaries, automatic translations of each summary produced using an available translation tool. We have developed trainable information extraction systems per language and have applied them to both original summaries and their automatic translations obtaining encouraging results.

