THE OPEN DATA INDIC CHALLENGE Speaker: Ms. Alolita Sharma
Time: Friday, 30 May 2014, 2:00pm Venue: Conference Room, C Block, 01st Floor, Department of Computer Science and Engineering, Kanwal Rekhi (KReSIT) Building, IIT Bombay, Powai, Mumbai.
Abstract: India has 23 official languages which have millions of native users as well as many more smaller languages supported by 8 major scripts, distinct vocabularies and grammatical rules. But when it comes to Wikipedia and other large repositories of user generated content on the Web, all these languages are very poorly represented compared to their real world usage.
There are many factors for this anomaly on the Web ranging from access to the Internet and computing devices to lack of language tools such as fonts and input tools and linguistic resources such as dictionaries, terminology glossaries, structured and linked data.
As the Web transforms itself into the mobile Web, an enormous opportunity to access and distribute information in Indic languages to billions of people is at stake. To make this opportunity real, user generated content needs to be jumpstarted with open data of all categories. Non-digital forms of dictionaries, terminology glossaries, geo-data - all kinds of categorized data need to be transformed into digitally consumable, standardized, and structured open data repositories which can be leveraged by users to create and contribute digital content on platforms like Wikipedia.
This talk will examine the barriers and solutions in creating open data repositories for Indic languages leveraging platforms like Wikipedia and language technologies for reading and creating content on the Web and mobile Web.