Filtrage is a project aiming at producing a multilingual corpus tagged with Named Entities. The languages are Arabic, French and English. The corpus is made of 30,000 newswires from the French news agency AFP (Agence France Presse). The corpus is semi-automatically annotated with a first annotation pass produced by a state-of-the-art system and then the annotations are corrected manually.
ELDA is responsible for this manual correction.
The following paper, describing the project, has been presented at the 2nd International Conference on Arabic Language Resources and Tools (MEDAR 2009), held on April 22-23 in Cairo, Egypt :
A Multilingual Named Entities Corpus for Arabic, English and French (Mostefa Djamel, Laïb Mariama, Chaudiron Stéphane, Choukri Khalid and de Chalendar Gaël)