Recently, I started using UTF-8 enabled applications to read and write in Tamil, the local official language here. It appears indic languages have been incorrectly represented at Unicode. India had sent less than 128 chars each language to Unicode consortium in the 1990s, much less than the full complement of characters in each. For example, among Tamil characters, only 31 chars (12 vowels and 18 consonants + 1 Final (ஃ) have specific codes, and the chart misses almost 12 x 18 characters which now have to be encoded with three to nine bytes per character. To make things worse, their arrangement is not in any natural order, and so sorting is difficult. It appears it is difficult to amend the charts now, as a number of applications have started using the unicode coding charts. Almost all indic languages have the same problem.
Some would like to now have a 16 bit encoded Tamil-New chart, with codes allocated for 250+ characters in the Private Use area. I am not sure if other indic language groups are aware of the issues here, and what their plans are to deal with it.
Padmakumar pointed out the issues there to the fsf-friends mailing list in 2004:
http://mm.gnu.org.in/pipermail/fsf-friends/2004-December/002653.html along with the link to the article at : http://www.angelfire.com/empire/thamizh/2/ (sad that there was no response to it)
A recent TVU conference doc on the issues there is available at: http://tamilvu.org/coresite/html/cwwhatnw.htm
There are a number of things that need to be done: [1] Add any missing characters and re-arrange the Tamil Unicode characters within the range of the existing 128 so that sorting could be done [2] Examine the TVU doc and offer suggestions to those concerned regarding Tamil 16 bit encoding. [3] Almost all indic languages are in the same boat here, and therefore, the language groups ought to come up with workable plans to remove the problems.
-Ramanraj K
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I apologise for the top-posting, but would you like to go through the discussions at indic@unicode.org mailing list ? There seems to be a lot of dialogue going on in this issue (which is normal and should be) but no clear solution being provided.
:SM
Ramanraj K wrote:
Recently, I started using UTF-8 enabled applications to read and write in Tamil, the local official language here. It appears indic languages have been incorrectly represented at Unicode. India had sent less than 128 chars each language to Unicode consortium in the 1990s, much less than the full complement of characters in each. For example, among Tamil characters, only 31 chars (12 vowels and 18 consonants + 1 Final (ஃ) have specific codes, and the chart misses almost 12 x 18 characters which now have to be encoded with three to nine bytes per character. To make things worse, their arrangement is not in any natural order, and so sorting is difficult. It appears it is difficult to amend the charts now, as a number of applications have started using the unicode coding charts. Almost all indic languages have the same problem.
Some would like to now have a 16 bit encoded Tamil-New chart, with codes allocated for 250+ characters in the Private Use area. I am not sure if other indic language groups are aware of the issues here, and what their plans are to deal with it.
Padmakumar pointed out the issues there to the fsf-friends mailing list in 2004:
http://mm.gnu.org.in/pipermail/fsf-friends/2004-December/002653.html along with the link to the article at : http://www.angelfire.com/empire/thamizh/2/ (sad that there was no response to it)
A recent TVU conference doc on the issues there is available at: http://tamilvu.org/coresite/html/cwwhatnw.htm
There are a number of things that need to be done: [1] Add any missing characters and re-arrange the Tamil Unicode characters within the range of the existing 128 so that sorting could be done [2] Examine the TVU doc and offer suggestions to those concerned regarding Tamil 16 bit encoding. [3] Almost all indic languages are in the same boat here, and therefore, the language groups ought to come up with workable plans to remove the problems.
-Ramanraj K
Fsf-friends mailing list Fsf-friends@mm.gnu.org.in http://mm.gnu.org.in/mailman/listinfo/fsf-friends
- --
You see things; and you say 'Why?'; But I dream things that never were; and I say 'Why not?' - George Bernard Shaw