Title |
Diacritization and Transliteration of Proper Nouns from Arabic to English |
Authors |
Hamdy S. Mubarak, Mohamed Al Sharqawy and Esraa Al Masry |
Abstract |
This paper proposes a complete system for the automatic diacritization and transliteration of proper nouns from Arabic to English using a database of name pairs in Arabic and English languages. The system consists of three phases: Correction, Diacritization, and Transliteration. Correction phase corrects the Common Arabic Mistakes (initial Hamza, final Yaa, and final Taa errors) using Normalization and corrects normal concatenation errors. The most frequent transliteration is considered in case of exact match with saved normalized tokens generated from proper names database. The missing diacritics are restored using Sakhr's Morphological Analyzer for analyzed tokens or from the best matching with patterns (for Arabic and Non-Arabic names) and consecutive characters obtained from the diacritized proper names. Transliteration rules are applied for the diacritized proper name to obtain the English equivalent (transliteration). Our results show an average accuracy of 89% on blind test sets with forced spelling mistakes (and 95% for correct input). |
Topics |
Machine translation to or from Arabic |
Full paper |
Diacritization and Transliteration of Proper Nouns from Arabic to English |
Bibtex |
@InProceedings{MUBARAK09.81,
author = {Hamdy S. Mubarak, Mohamed Al Sharqawy and Esraa Al Masry},
title = {Diacritization and Transliteration of Proper Nouns from Arabic to English},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |