Title |
An RSS Feed Analysis Application and Corpus Builder |
Authors |
Shereen Khoja |
Abstract |
The RSS Feed Analysis Application and Corpus Builder is a software application that downloads given RSS feeds and compiles them into a corpus. The user simply supplies RSS feed addresses and the application automatically connects to the feeds, downloads them, and strips any formatting tags. The application incorporates the Expat () XML parser to identify the tags in the RSS feeds, and the users have the flexibility to define what they would like to keep and what is to be stripped. The application was tested on a project to analyse Middle Eastern Blogs. Thirty-seven blogs were downloaded using the RSS Feed Analyser and compiled into a corpus of 131,836 words. Both the RSS Feed Analyser and corpus are freely available under the GNU General Public Licence. |
Topics |
Multilingual document retrieval, LRs for linguistic research in human-machine communication, Methods, tools and procedures for acquisition, creation, management, access, distribution and use of Arabic LRs |
Full paper |
An RSS Feed Analysis Application and Corpus Builder |
Bibtex |
@InProceedings{KHOJA09.73,
author = {Shereen Khoja},
title = {An RSS Feed Analysis Application and Corpus Builder},
booktitle = {Proceedings of the Second International Conference on Arabic Language Resources and Tools},
year = {2009},
month = {April},
date = {22-23},
address = {Cairo, Egypt},
editor = {Khalid Choukri and Bente Maegaard},
publisher = {The MEDAR Consortium},
isbn = {2-9517408-5-9},
language = {english}
} |