Posted to the Ethnos Project by on September 6th, 2011


Many human languages, an essential part of culture, are in danger of extinction. UNESCO estimates that at least a half of the world’s 6500 spoken languages will disappear within the next 100 years. This problem can be addressed to some extent by computer systems that collect, archive and disseminate dictionaries for various languages, thus performing the key function of preservation.

The approach taken in this project was to develop a Web-based multilingual thesaurus, with mechanisms for the submission and retrieval of language data and metadata. This thesaurus was built on top of the FEDORA Web-based digital repository toolkit. Two distinct user interfaces were then developed as part of a proof of concept language preservation system, namely a Web interface and a cell phone interface. These were created using AJAX and J2ME+GPRS respectively.

Both user interfaces were designed using an iterative User-Centred Design approach, and the back-end system was designed to meet the needs of the user interfaces, with a Web-based API.

The resulting system proved to be useful as users indicated that they could preserve spoken languages by submitting and retrieving words in their own languages. The independent successful evaluations of the 2 user interfaces together demonstrate the feasibility of creating a preservation-directed archive as a layered Web-based digital repository, where the preservation function is separable and accessible through a well-defined Web-based API.

Keywords: Language Preservation, Digital Repository, User-Centred Design, User Interface

1. Introduction

Many languages are in danger of being lost and, if nothing is done to prevent it, it is estimated that half of the world’s approximately 6500 languages will disappear in the next 100 years [1]. Total death of a language occurs when a language that was previously used by a certain community is no longer spoken. This brings about the loss of cultural heritage, for language is a unique medium for its traditions and culture. Language data are central to the interests of social science research communities, including linguists, anthropologists, archaeologists, historians, sociologists and political scientists interested in culture of indigenous people. Besides popular and official world languages, many people are fluent in a diversity of regional dialects. Over time many of these languages fade away and with this significant elements of culture and history are lost.

It may be possible to address this using a computer system to collect, archive and disseminate dictionaries for various languages. The system should allow for searching and browsing through dictionaries, perform translations from one language to another, and include etymology and annotations.

WordBank was developed in response to the problem of dying languages. WordBank is a component- and Web-based multilingual thesaurus with a service-oriented architecture and a mechanism for the submission and retrieval of language data and metadata. WordBank comprises two distinct interfaces, namely: an AJAX-based Web interface and a J2ME-based Cellphone interface from which the users could submit and retrieve language data; and a back-end archive to store data and metadata which will allow retrieval of information on different languages from the archive via the interfaces. The back-end archive was built on top of the Flexible and Extensible Digital Object and Repository Architecture (FEDORA) [5] open source digital repository system.

For the purpose of this study, the chosen languages were Arabic, Portuguese and Sesotho, primarily because they are the researchers’ native languages. However, WordBank can archive any other language.

The rest of this paper is organised as follows: section 2 is about related systems, section 3 is about the implementation details, section 4 is about the user evaluation conducted and finally section 5 concludes the paper.

Continue reading the article >>

Leave a Reply

Your email address will not be published. Required fields are marked *