I was fishing around for some information on the possible Indianisation of GNU/Linux. As Ms Mita puts it, Edward Cherlin's info (via the Simputer mailing list on yahoogroups) appears to be mind-boggling. What do others on this list think of it? Any feedback? FN
On Fri, 16 Aug 2002, Mita wrote:
Dear Edward, What an excellent and valuable input! Thank you! By the way, you don't sound like a generalist at all, more like a specialist! Checked out both the sites you recommend; one was rather technical, but the sil.org site is really great. So did you go to the presentation of the Simputer there in California? Did anybody on this list go? Please do tell us first hand how it was! Did the simputer live up to all your expectations? Cheers Mita
-----Original Message----- From: Edward Cherlin [mailto:cherlin@pacbell.net] Sent: Wednesday, August 14, 2002 9:29 PM To: simputer@yahoogroups.com Subject: Re: [simputer] [OFFTOPIC] Request for some inputs for an article
On Tuesday 13 August 2002 02:06 am, Frederick Noronha wrote:
Dear friends,
I am thinking of doing a piece on the challenges posed by plans for Indianisation of GNU/Linux.
If you could give me some additional insights into the following issues (in a language that a simple reader would follow), I'd be very grateful. If you could send me the replies in a day or two, I'd be even more grateful.
For my side, I will circulate the article among all before the Bangalor meeting. Thanks again. Frederick
You're welcome. Where will this be published?
PS: Issues I need information on:
What is your reading of:
- Demand for Indian-language computers
Although I have met a few engineers from India who speak only English, the answer is that the demand for Indian-language computers equals the demand for computers in India, plus a bit for places like the UK, Pakistan and Bangla Desh that do significant business with India or have significant Indian minorities. You can find current sales and installed base numbers on the Net. If you need help with this, e-mail me offline.
- Main languages which could be tackled at this stage
By next year, the Pango project should support all nine official Indic scripts, so the answer is "All of them."
- Which languages would pose greater difficulties
Languages that are traditionally not written, or are written in non-standard variants of the standard scripts. Talk to Peter Constable at SIL.org about this.
- Applications that are needed to be run in Indian languages
Everything. Don't take "No" for an answer. On Linux, you can volunteer to Indicize any application. In the future, when font management and rendering are standardized, all applications will run in Indian languages for input and output without further ado, and anyone will be able to create a localization file to customize the user interface. Volunteers are also needed to translate documentation.
- Lack of support in other OSs
Indic and other South Asian scripts are the final challenge to computer vendors for full I18n support. Progress is slow at Microsoft and Apple. Linux should pass them by the end of the year, or early in 2003.
- Technical challenges in supporting Indian languages
The principal problem is rendering conjuncts without proper rendering engines and properly encoded fonts. Users want to type a sequence of characters, and not concern themselves with the details of rendering. This requires fonts with appropriate tables giving the possible character sequences and the glyphs for rendering each, and an engine that knows to read the tables.
Apple and Microsoft are not willing simply to support typing, display, and printing. They will not release language and writing system support until they have complete locales built, preferably including a dictionary and spelling checker. Linux is under no such constraints.
X
The Free Standards Group together with Li18nux.org are proposing to rationalize and simplify I18n support under X, including a common rendering engine, shared font paths, and other standards that will greatly simplify the business of supporting all writing systems and all languages.
Toolkits
Dozens. Write to me off-list and tell me what tasks you want tools for.
Fonts? Check out pfaedit. Keyboards? Unix keyboard files can be prepared in any text editor. Rendering? Pango, Graphite Software localization? IBM ICU, GTK, various languages... Multilingual editing? Yudit and emacs both support several Indic scripts, and could be extended with only moderate effort on the part of a few experts. I have not tested vim, the other Unix Unicode editor. Most other Unix applications accept some Indic input.
Mandrake Linux includes Bengali, Gujarati, Gurmukhi, Hindi Devanagari, and Tamil out of the box. That leaves Oriya, Malayalam, Telugu, and Kannada still to be done, along with the Indic-derived Lao, Sinhala, Myanmar, and Khmer. Tibetan and Thai are moderately well supported.
Fonts?
There are two projects to create a complete rendering engine: Pango (Pango.org, Li18nux.org) and Graphite (sil.org). They also have plans for complete sets of Unicode fonts (including not just the Unicode characters, but also all of the non-character glyphs for rendering Indic scripts.
Voice synthesis and recognition?
IBM and several other companies have projects to support 40 or more languages each. I can put you in contact with a voice engineer in Silicon Valley, and you of course have access to the voice engineers involved in the Simputer design.
Any right-to-left languages?
Arabic and Hebrew are officially supported on Windows, Mac, and Unix. Other languages in the same scripts (Urdu, Pashto, Farsi/Dari, etc.; Yiddish, Ladino, etc.) can be typed but do not have full support. Syriac (once used to write Malayalam) is included in Unicode but not well supported. Thaana, used for Dhivehi in the Republic of Maldives, is similarly encoded but not well supported. I can provide more details, or put you in contact with experts on any of these.
What other challenges do Indian languages pose?
That's it. We have them all in hand. Well, dictionaries and spelling checkers, of course. Word-breaking doesn't operate the same way in Indic scripts as in the Latin alphabet. Fine typography, which you don't find in consumer or office applications in any language. And the sheer number. There are more than 800 languages spoken in India.
- Projects that are currently underway, and which are interesting,
in your opinion.
As I said, Pango, Graphite, Li18nux, Free Standards. Mandrake Linux emphasizes multilingual support, and welcomes offers of help. And of course Simputer.