Title |
Transliteration using phrase based SMT approach on substrings |
Authors |
Sara Noeman |
Abstract |
Transliteration using phrase based SMT approach on substrings Translation of named entities (NEs), such as person names, organization names and location names is crucial for cross lingual information retrieval, machine translation, and many other natural language processing applications. Newly named entities are introduced on daily basis in newswire and this greatly complicates the translation task. Named Entities translation between languages having different orthographic basis is more complex than translation between similar languages; this is due to the fact that languages with different orthographic basis may have different mapping between consonants and vowels. For example when translating English names to Arabic names many problems arise due to lexical difference. Firstly, Arabic deploys unwritten forms of short vowels in contrary with English names where short vowels are usually written. In such cases, Arabic short vowels (Fathah, Kasrah and Dammah) are being pronounced and should be used in the target language. Secondly, some Arabic consonants may be mapped to various English consonants, Examples: (س -> s, c), (ب -> b, p), and others are mapped to more than one consonant, Ex. (ث, ذ -> th) which makes the problem a kind of many to many mapping task. Finally, a general problem of Named Entities transliteration is that it is always preferable to produce the most commonly used form of the name. In this paper we introduce a substring based Arabic to English transliteration system cascaded by spelling correction module. |
Topics |
Multilingual document retrieval, Multilingual information retrieval, Spoken translation |
Full paper |
Transliteration using phrase based SMT approach on substrings |
Bibtex |
@InProceedings{NOEMAN09.23,
author = {Sara Noeman},
title = {Transliteration using phrase based SMT approach on substrings},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |