Title |
Stem-based Arabic Language Models Experiments |
Authors |
Mohsen Moftah, Waleed Fakhr, Sherif Abdou and Mohsen Rashwan |
Abstract |
Arabic is one of the languages that are often described as morphologically complex. This nature of the Arabic language leads to rapid vocabulary growth which is accompanied by worse language model (LM) probability estimation and a higher out-of-vocabulary OOV rate. Morphology-based language models have been proposed to overcome such problems. In a morphology based language model the input text is analyzed and every word is split into a stem and affixes. In this paper, stem-based language models for Modern Standard Arabic (MSA) are developed and compared to the word-based model. The conventional word-based language model was considered as the baseline and stem-based language models are built and compared with the word-based one. For Stem-based language models, a number of manipulations were applied to the input data and new language models were built in each case and results were compared with both the baseline and the original stem-based language model. |
Topics |
Extraction and acquisition of knowledge (e.g. terms, lexical information, language modelling) from LRs |
Full paper |
Stem-based Arabic Language Models Experiments |
Bibtex |
@InProceedings{MOFTAH09.59,
author = {Mohsen Moftah, Waleed Fakhr, Sherif Abdou and Mohsen Rashwan},
title = {Stem-based Arabic Language Models Experiments},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |