Summary of the paper

Title Assessing word-form based search for information in Arabic: towards a new type of lexical resource
Authors Mouna Anizi and Joseph Dichy
Abstract Albeit real progress has been made during the last two decades, finding or retrieving information in Arabic with the help of a search engine remains difficult, owing to the high level of ambiguity entailed by the structure of 'unvowelled' Arabic writing. These language-specific difficulties are brought to a peak in the case of queries based on single words. The contribution analyses the results of search queries on Google, which are compared to the results of word-form analyses obtained both on the ArabiCorpus site (http://arabicorpus.byu.edu/) and with analysers based on the DIINAR.1 lexical resource (references at http://diinar.univ-lyon2.fr). An assessment protocol is proposed. Clearly, it aims at evaluating neither the analysers of ArabiCorpus and DIINAR.1, nor the Google search engine. Examining the latter is quite another question, related to Google ranking, and speed. The aim of the paper, instead, is to explore and assess the possibilities and limitations of word-form based queries in Arabic, i.e. the result of queries obtained with wordform based analysers and language resources (what can be obtained and what strictly speaking cannot). The protocol includes (a) comparing results obtained through Google with the often numerous word-forms obtained through the two other sources, (b) considering a number of semantic aspects related to the contexts query words appear in, and (c) taking into account wordbefore/ word-after collocations and set phrases. It eventually introduces essential features of a new type of lexical resource for future Arabic search engines, which needs to contain, among other components : (a) a compact and comprehensive database operating at word-form level, such as DIINAR.1, and (b) an extended lexical resource that includes semantic relations, collocations and set or semiset expressions.
Topics Exploitation of LRs in different types of applications (information extraction, information retrieval, speech dictation, translation, summarisation, web services, semantic web, etc.),
Roadmapping for Arabic language technology,
Definition and requirements for a Basic LAnguage Resource Kit (BLARK) for Arabic
Full paper Assessing word-form based search for information in Arabic: towards a new type of lexical resource
Bibtex @InProceedings{ANIZI09.75,
  author = {Mouna Anizi and Joseph Dichy},
  title = {Assessing word-form based search for information in Arabic: towards a new type of lexical resource},
  booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
  year = {2009},
  month = {April},
  date = {22-23},
  address = {Cairo, Egypt},
  editor = {Khalid Choukri and Bente Maegaard},
  publisher = {The MEDAR Consortium},
  isbn = {2-9517408-5-9},
  language = {english}
  }

Powered by ELDA © 2009 The MEDAR Consortium