Proceedings of the Second International Conference on Arabic Language Resources and Tools

Summary of the paper

Title	Arabic Part-Of-Speech Tagging using Transformation-Based Learning
Authors	Shabib AlGahtani, William Black and John McNaught
Abstract	Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora with a notable success. Inevitably, shifting from classical rule-based to corpus-based method has a major drawback. That is, most of corpus-based ones produce mathematical models that are hard to interpret and modify along with their higher complexity in terms of required processing power and memory allocation. Luckily, Transformation-based learning technique is one corpus-based method that embraces the power of both worlds; overcoming obscurity and complexity without relinquishing state-of-the-art accuracy. This paper examines the application of TBL to the task of tagging Modern Standard Arabic text. For unknown words guessing, an n-gram technique has been adopted to select best tag from a list of candidates outputted from a morphological analyzer exploiting previous context. The developed tagger achieved an accuracy of 98.6% when evaluated on the train set and 96.9% on the test set. Furthermore, the same unknown words module has been slightly modified and successfully applied to the task of word-tokenization with an accuracy of 99.6%.
Topics	Taggers and Parsers
Full paper	Arabic Part-Of-Speech Tagging using Transformation-Based Learning
Bibtex	@InProceedings{ALGAHTANI09.43, author = {Shabib AlGahtani, William Black and John McNaught}, title = {Arabic Part-Of-Speech Tagging using Transformation-Based Learning}, booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools}, year = {2009}, month = {April}, date = {22-23}, address = {Cairo, Egypt}, editor = {Khalid Choukri and Bente Maegaard}, publisher = {The MEDAR Consortium}, isbn = {2-9517408-5-9}, language = {english} }