Title |
Syntactic Annotation in the Columbia Arabic Treebank |
Authors |
Nizar Habash, Reem Faraj and Ryan Roth |
Abstract |
The Columbia Arabic Tree Bank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on faster production with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach. First, CATiB avoids the annotation of redundant linguistic information that is determinable automatically from syntax, e.g., nominal case. And secondly, CATiB uses linguistic representation and terminology inspired by Arabic's long tradition of syntactic studies. This makes it easier to train annotators and not be restricted to hire annotators who have degrees in linguistics. CATiB uses an intuitive dependency representation and relational labels inspired by Arabic grammar such as tamyiz and idafa in addition to the more commonly used relations of subject, object and modifier. This paper describes CATiB's representation and annotation strategy and procedure, and compares CATiB to other Arabic treebanking efforts. |
Topics |
Methods, tools and procedures for acquisition, creation, management, access, distribution and use of Arabic LRs, Monolingual and multilingual LRs, Guidelines, standards, specifications, models and best practices for Arabic LRs |
Full paper |
Syntactic Annotation in the Columbia Arabic Treebank |
Bibtex |
@InProceedings{HABASH09.25,
author = {Nizar Habash, Reem Faraj and Ryan Roth},
title = {Syntactic Annotation in the Columbia Arabic Treebank},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |