PAN Localization project releases its research and outputs on 11th Mother Language Day
A Giant Leap for Multilingual Cyberspace
“In the field of IT, all language communities are entitled to have at their disposal equipment adapted to their linguistic system and tools and products in their language, so as to derive full advantage from the potential offered by such technologies for self-expression, education, communication, publication, translation and information processing and the dissemination of culture in general” [1].
PAN Localization project (www.panl10n.net) has been a regional initiative addressing these challenges and promoting the use of language technology across Asia. The project, initiated in 2003, has developed and disseminated computing solutions for Bahasa Indonesia, Bangla, Dzongkha, Khmer, Lao, Mongolian, Nepali, Pashto, Sinhala, Tamil Tibetan and Urdu. These languages represent a population of nearly one billion people across developing Asia and globally.
On the occasion of the eleventh International Mother Language Day, 21st February 2010, PAN Localization project is pleased to release its research, technology and resources through its website (www.panl10n.net).
This project has been carried out with collaboration of Pan Asia Networking (PAN) program of IDRC, Canada (www.idrc.ca), Centre for Research in Urdu Language Processing (www.crulp.org) at National University of Computer and Emerging Sciences, Pakistan (www.nu.edu.pk) and the following partner organizations:
- Afghan Computer Science Association (ACSA: www.acsa.org.af), Afghanistan
- BRAC University (CRBLP: crblp.bracu.ac.bd), Bangladesh
- Development Research Network (D.NET: www.dnet-bangladesh.org), Bangladesh
- Department of IT (DIT: www.dit.gov.bt), Bhutan
- Ministry of Education, Youth and Sports (www.pancambodia.info), Cambodia
- Institute of Technology (ITC: www.itc.edu.kh), Cambodia
- National ICT Development Authority (NIDA: www.nida.gov.kh), Cambodia
- Tibet University (TU: www.utibet.edu.cn), China
- Institute of Science and Technology, TAR China
- Tibet Academy of Agricultural and Animal Husbandry Sciences, China
- University of Indonesia (UI: www.ui.ac.id), Indonesia
- Agency for the Assessment and Application of Technology (BPPT: www.bppt.go.id), Indonesia
- National Authority for Science and Technology (NAST: www.nast.gov.la), Laos
- InfoCon Co. Ltd.( www.infocon.mn), Mongolia
- Mongolian University of Science and Technology (MUST: www.must.edu.mn), Mongolia
- National University of Mongolia (NUM: www.num.edu.mn), Mongolia
- Madan Puraskar Pustakalaya (MPP: www.mpp.org.np), Nepal
- E-Network Research and Development (ENRD: www.enrd.org), Nepal
- University of Colombo School of Computing (LTRL, UCSC: www.ucsc.cmb.ac.lk/ltrl), Sri Lanka
[1] Universal Declaration on Linguistic Rights, UNESCO, 1996.
Salient Research Outputs
Bahasa Indonesia
Statistical Machine Translation (Awarded), English-Bahasa Parallel Corpus (1 Million words), POS Tagged Bahasa Corpus (500,000 words), Part of Speech Tagset and Tagger...[details]
Bangla
Text to Speech System (Awarded), Optical Character Recognition System (Shortlisted for Award), Bangla Pad, Spell Checker, Lexicon, Language Table for IDNs, Part of Speech Tagset and Tagger, Wordnet (1000 words), Tagged Corpus (5 Million words), English-Bangla Parallel Corpus, Training on Content Development using infomediaries, Online Legal Content for Farmers in Bangla…[details]
Dzongkha
DzongkhaLinux, Optical Character Recognition System, Language Table for IDNs, Part of Speech Tagset, Corpus (600,000 words), Lexicon (23,000 words), Text to Speech System (prototype), Dzongkha Terminology, Collation, Locale, Fonts and Keyboard, Training on DzongkhaLinux…[details]
Khmer
Optical Character Recognition System, Java Applications and OpenOffice.org Plug-ins for Collation, Encoding Conversion, Word Segmentation, Locale, Mobile SMS, Language Table for IDNs, Part of Speech Tagset and Tagger, Lexicon, Text to Speech System (prototype), Tagged Corpus (150,000 words), Online Khmer Content on Veticar.com, Training of Govt. officials on Khmer Open Source Software…[details]
Lao
Optical Character Recognition System, OpenOffice.org and MS Office Plug-in for Word Segmentation, Collation, Spell Checker, Lao Pad, Fonts, Keyboard, Language Table for IDNs, Part of Speech Tagset, POS Tagged Corpus, Parallel Corpus (37,000 words), Online Lao Content …[details]
Mongolian
Part of Speech Tagset and Tagger, Spell Checker, Corpus (1,000,000 words), Tagged Corpus (100,000 words), Lexicon (10,000 words), Automatic Speech Recognition, Localization of Pidgin and SeaMonkey… [details]
Nepali
NepaLinux (Awarded), Spell Checker, Grammar Checker, Parallel Corpus (100,000 words), Tagged Corpus (80,000 words), Lexicon (37,000 words), Optical Character Recognition System (prototype), Language Table for IDNs, Training Material on NepaLinux, Training of Rural Centers on Nepali Open Source Software …[details]
Pashto
Localized SeaMonkey (Awarded), Keyboard, Fonts, Language Table for IDNs,…[details]
Sinhala & Tamil
Sinchala Optical Recognition System, Sinhala Text to Speech System (Awarded), Screen Reader for Sinhala for Blind, Language Learning Tool for Tamil in Sinhala and English, Sinhala Wordnet, Localized OpenTM,, Language Table for IDNs, Collation Standard, Encoding Conversion tool, Training Students for development of Online Sinhala Content…[details]
Tibetan
Collation, Online Tibetan Content, Farmer Training on using Online Tibetan Content…[details]
Urdu
Parallel Corpus (100,000 words), Stemmer, Collation, Optical Character Recognition, Localization of OpenOffice.org, SeaMonkey, Web Composer and Psi, Terminology Glossary, Gendered Outcome Mapping Tool (Awarded), Part of Speech Tagset and Tagger, Tagged Corpus (200,000 words), Language Table for IDNs, Training Material on Localized Applications, Training on Localized Software to Rural School Children, Content Generated by Rural School Children and Teachers …[details]
And much more … on the project website (www.panl10n.net).