Summary of the paper

Title A Study of Arabic Text Preprocessing Methods for Text Categorization
Authors Dina Said, Nayer Wanas, Nevin Darwish and Nadia Hegazy
Abstract Text preprocessing is an essential stage in text categorization (TC) particularly and text mining generally. Morphological tools can be used in text preprocessing to reduce multiple forms of the word to one form. There has been a debate among researchers about the benefits of using morphological tools in TC. Studies in the English language illustrated that performing stemming during the preprocessing stage degrades the performance slightly. However, they have a great impact on reducing the memory requirement and storage resources needed. The effect of the preprocessing tools on Arabic text categorization is an area of research. This work provides an evaluation study of several morphological tools for Arabic Text Categorization. The study includes using the raw text, the stemmed text, and the root text. The stemmed and root text are obtained using two different preprocessing tools. The results illustrated that using light stemmer combined with a good performing feature selection method enhances the performance of Arabic Text Categorization especially for small threshold values.
Topics Exploitation of LRs in different types of applications (information extraction, information retrieval, speech dictation, translation, summarisation, web services, semantic web, etc.),
Extraction and acquisition of knowledge (e.g. terms, lexical information, language modelling) from LRs,
Guidelines, standards, specifications, models and best practices for Arabic LRs
Full paper A Study of Arabic Text Preprocessing Methods for Text Categorization
Bibtex @InProceedings{SAID09.17,
  author = {Dina Said, Nayer Wanas, Nevin Darwish and Nadia Hegazy},
  title = {A Study of Arabic Text Preprocessing Methods for Text Categorization},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium