This post is adapted from Global Native Networks.
It’s no secret that a handful of languages dominate the online space. A paper published last October in the journal PLOSOne (aptly titled “Digital Language Death”) found that less than five percent of the current world languages are in use online.
Offline, around 7,776 languages are in use today. To determine how many were in use online, András Kornai developed a program to crawl top-level Web domains and document the number of words in each language. The results are, according to Kornai, “evidence of a massive die-off caused by the digital divide.”
Coding for Language Communities 2014
Enter the CIDLeS Summer School 2014: Coding for Language Communities, a conference that aims to give everyone the ability to engage digital spaces in their mother tongues. According to the CIDLes website, the conference will bring together three groups:
Speakers of languages that are currently not supported by language technologies and that want to use their language on electronic devices;
Students of linguistics and language-related disciplines interested in learning about software development;
Software developers and students of computational sciences that are interested in supporting under-resourced languages by technological means.
The Summer School, which will take place August 11th – 15th within the “Parque Natural das Serras de Aire e Candeeiros” in or near Minde, Portugal, brings together some heavy-hitters in the growing field of expanding linguistic diversity online. These experts will serve as the mentors for summer school participants.
For example, Kevin Scannell, best known for his Indigenous Tweets project, will “turn corpus data into spellcheckers.” Students will learn how to crawl, clean and tokenize data from the web, then generate frequency lists, add morphology, all the way to packaging up Firefox/OpenOffice extensions.
Bruce Birch, a linguist working on languages of Iwaidjan family spoken in Arnhem Land, will work on a mobile app for crowdsourced data collection and publishing. Among his ideas are apps for lexical data, phrasebooks, stories and other data from endangered languages. You will learn how to develop a database for the content and how to enable users to edit and share their content.
All and all, this looks like a critical first step in creating cadres of indigenous and minority languages speakers who also have the ability to build digital tools that accommodate those languages.
While I am sure the team of experts is well aware of this problem, many indigenous and minority languages are spoken languages. The most popular modes of Internet engagement (email, Facebook, Twitter, etc) remain fundamentally textual practices. How can we envision a more inclusive digital space that makes room for both spoken and written languages?