Title |
Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank |
Authors |
Mohamed Maamouri, Ann Bies and Seth Kulick |
Abstract |
The LDC Arabic Treebank team has significantly revised and enhanced its annotation guidelines and annotation procedures over the last two years, with the goal of reducing inconsistency in annotation in the Treebank. We have now completed automatic and significant manual revisions to 738,845 tokens/words in total, bringing them into line as far as possible with the new annotation guidelines and greatly improving the annotation consistency. We created a methodology for large-scale correction of Treebank annotation during the course of this revision process, balancing the need for consistency with tight time constraints for correcting and updating a large amount of data annotated according to previous guidelines. The combination and interleaving of automatic and manual corrections were crucial to the success of the overall revision. We also demonstrate the success of the revision by reporting on an improvement in parsing results. |
Topics |
Evaluation, validation, quality assurance of Arabic LRs, Monolingual and multilingual LRs, Guidelines, standards, specifications, models and best practices for Arabic LRs |
Full paper |
Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank |
Bibtex |
@InProceedings{MAAMOURI09.68,
author = {Mohamed Maamouri, Ann Bies and Seth Kulick},
title = {Creating a Methodology for Large-Scale Correction of Treebank Annotation: the Case of the Arabic Treebank},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |