Title |
Arabic Part-Of-Speech Tagging using Transformation-Based Learning |
Authors |
Shabib AlGahtani, William Black and John McNaught |
Abstract |
Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora with a notable success. Inevitably, shifting from classical rule-based to corpus-based method has a major drawback. That is, most of corpus-based ones produce mathematical models that are hard to interpret and modify along with their higher complexity in terms of required processing power and memory allocation. Luckily, Transformation-based learning technique is one corpus-based method that embraces the power of both worlds; overcoming obscurity and complexity without relinquishing state-of-the-art accuracy. This paper examines the application of TBL to the task of tagging Modern Standard Arabic text. For unknown words guessing, an n-gram technique has been adopted to select best tag from a list of candidates outputted from a morphological analyzer exploiting previous context. The developed tagger achieved an accuracy of 98.6% when evaluated on the train set and 96.9% on the test set. Furthermore, the same unknown words module has been slightly modified and successfully applied to the task of word-tokenization with an accuracy of 99.6%. |
Topics |
Taggers and Parsers |
Full paper |
Arabic Part-Of-Speech Tagging using Transformation-Based Learning |
Bibtex |
@InProceedings{ALGAHTANI09.43,
author = {Shabib AlGahtani, William Black and John McNaught},
title = {Arabic Part-Of-Speech Tagging using Transformation-Based Learning},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |