THE OPEN DATA INDIC CHALLENGE
Speaker: Ms. Alolita Sharma
Time: Friday, 30 May 2014, 2:00pm
Venue: Conference Room, C Block, 01st Floor, Department of Computer Science
and Engineering, Kanwal Rekhi (KReSIT) Building, IIT Bombay, Powai, Mumbai.
Abstract:
India has 23 official languages which have millions of native users as well
as many more smaller languages supported by 8 major scripts, distinct
vocabularies and grammatical rules. But when it comes to Wikipedia and
other large repositories of user generated content on the Web, all these
languages are very poorly represented compared to their real world usage.
There are many factors for this anomaly on the Web ranging from access to
the Internet and computing devices to lack of language tools such as fonts
and input tools and linguistic resources such as dictionaries, terminology
glossaries, structured and linked data.
As the Web transforms itself into the mobile Web, an enormous opportunity
to access and distribute information in Indic languages to billions of
people is at stake. To make this opportunity real, user generated content
needs to be jumpstarted with open data of all categories. Non-digital forms
of dictionaries, terminology glossaries, geo-data - all kinds of
categorized data need to be transformed into digitally consumable,
standardized, and structured open data repositories which can be leveraged
by users to create and contribute digital content on platforms like
Wikipedia.
This talk will examine the barriers and solutions in creating open data
repositories for Indic languages leveraging platforms like Wikipedia and
language technologies for reading and creating content on the Web and
mobile Web.
--
Siji Sunny