Posted to the Ethnos Project by on September 14th, 2014

Only a handful of the roughly-7,000 languages in the world have well-developed resources. This has the effect of excluding billions of people from modern opportunities for knowledge and prosperity. The Kamusi Project has developed an unparalleled system to gather and freely share all possible words in all possible languages. However, to do it right, they urgently need to address a variety of programming issues, from security to language modeling – and those fixes and extensions need serious coin.

Read 23 Reasons Why Kamusi is Broke

I recent became a member of Kamusi (which means I have committed to a small annual contribution to help sustain the project). My coins will mean little unless they are joined by others, and so I invite you to help “build the most comprehensive language resource for the future – a complete matrix of human expression across time and space.” They are currently engaged in a Global Giving campaign for which they need to raise $5,000 from at least 40 donors by the end of this month to earn a permanent spot on the Global Giving site.

Kamusi at 20

The following text is from Don Osborn’s Beyond Niamey blog – originally published as “Kamusi at 20: Keeping the vision alive and working.” It is reposted here in full with his permission.

The Kamusi Project, which seeks to provide an open dictionary of all languages, for use in reference and in language technology development, is facing a challenge not unfamiliar to other language-related initiatives: Funding. This effort – sometimes perceived as too ambitious or esoteric but always visionary in its goals and use of technology – is currently campaigning for support through the Global Giving Open Challenge.

Kamusi was originally developed in 1994 as a proposal by Dr. Martin Benjamin and Dr. Ann Biersteker, then both at Yale University’s Council on African Studies. Billed as the “Internet Living Swahili Dictionary,” its objective was to respond to the need for new reference material on Swahili, and to do so by using the potential user contributions over the internet (this was more than 6 years before Wikipedia was launched). It is worth noting that a reason cited for exploring the internet medium for dictionary development was unfavorable “economics of Swahili publishing.” Kamusi is still today an excellent Swahili resource (both monolingual and English <-> Swahili), even as its goals have evolved.

Kamusi was run at Yale, with benefit of US Department of Education funding, until 2006, and during this time was recognized as a finalist in the Stockholm Challenge 2001. At the end of this period, Dr. Benjamin – Martin to those who know him – summarized Kamusi in the context of African languages on the web at Wikimania 2006. He continued to run Kamusi as it transitioned in 2007 from Yale to a server hosted by the World Language Documentation Centre.

Under Martin’s direction, Kamusi has since then been incorporated as a non-profit in the US and Switzerland (where he lives with his family), and has expanded its mission beyond Swahili to a pan-African and eventually global scope.

Funding from Canada’s International Development Research Centre (IDRC) for Kamusi, as part of the multi-member African Network for Localisation (ANLoc) project, enabled Kamusi to lead development of locales for 100 African languages (locale data facilitates computer software handling a language) and terminology for 12 African languages. Later funding from the US National Endowment for the Humanities (NEH) enabled work on a pilot for Kamusi’s multilingual model (basically, there’s a lot more to a multilingual dictionary than words in parallel, since concepts don’t line up neatly across languages).

Since the conclusion of major funding in 2012, Kamusi has continued work on the multilingual model, including how to annotate degrees of separation (when a concept is translated through another language), homophones, multi-word expressions (something I personally wish machine translation had been better at years ago), and data input from any language. In 2013 Kamusi’s work gained it recognition as a launch partner in the White House Big Data Initiative.

Although Kamusi has an affiliation with l’École polytechnique fédérale de Lausanne (EPFL) since last September, this has not filled the funding gap to enable completion of the programming work necessary to bring all of this to fruition and take the Global Online Living Dictionary (GOLD) from a proven pilot project to a full-scale reality.

Looking at Kamusi’s history – which is long in internet terms – one is impressed by the thought and effort that has gone into it, by Martin and by a range of other contributors, from its beginning at Yale to recent collaborations and donors, with many individual contributions all along. It would be a shame if current funding difficulties would cause this important work to end.

For something like Kamusi, it helps, I think, to look as far ahead as we can look back. Twenty yeas from now, the advantages of building language resources for the many languages that don’t have the economic or political/policy weight to get commercial and investor attention – even if they have demographic importance (keep in mind how quickly Africa’s population is growing, for instance), but especially if those numbers aren’t there either – will be a lot more apparent than they seem today. For countries where many of these languages are spoken, like most of those in Africa, there is a long-term need for projects like Kamusi that connect high level language technology with less-resourced and often low-status languages – and in Kamusi’s case, also link those with the more widely spoken international languages.

At this point, Kamusi’s effort to gain enough support to qualify for ongoing listing on Global Giving is an attempt to keep the organization going at a critical period in its history. Please consider helping – click the “Give now” button below.

Give Now

Leave a Reply

Your email address will not be published. Required fields are marked *