Re: Looking for info on developing pango shapers for indian languages



Kaixo!

On Mon, Oct 16, 2000 at 01:51:08PM +0100, Robert Brady wrote:

> > Doesn't the ISCII standard yhas some provision for that ?
> 
> I'm not sure exactly.
> 
> > I know that is the case for the Hindi vs Mahrati half form of ra
> > (the Mahrati one is encoded ra + nukta + virama) 
> 
> There are certainly some consonant which should make a conjunct in
> Sanskrit, but just use a regular halfform/form ligature in Hindi.

Reading the docs I have, I see that ISCII has a provision for the oposite
case (conjuncts are used in Hindi but in Sanskrit and Veda texts the previous
consoun keeps its normal form with a visible virama sign under it.
Two consecutive virama signs are used for that.

There is also the "soft halant" (soft virama) that is encoded as a virama
sign folowed by a nukta sign; the result is that the previous letter keeps
a half form and don't combinate with the following letter (even when a
combination would be possible).

The document I have is an English translation of ISCII91 you can find at
http://cdac.org.in/html/gist/iscii91.pdf

Note that the ISCII91 charsets are not the same as those in unicode;
however you can easily make an unambiguous multibyte conversion
between them (the lettes with a nukta have a codepoint in unicode, but
are decomposed in base letter + nukta in iscii91; the reason is that
iscii91 defines an 8bit encoding; that is limited in space.
There also a few non-letter chars in unicode not in iscii91:
     <U0951> DEVANAGARI STRESS SIGN UDATTA
     <U0952> DEVANAGARI STRESS SIGN ANUDATTA
     <U0953> DEVANAGARI GRAVE ACCENT
     <U0954> DEVANAGARI ACUTE ACCENT
     <U0970> DEVANAGARI ABBREVIATION SIGN
     <U0965> DEVANAGARI DOUBLE DANDA [*]
[*]: can be encoded as 2 DANDA  

Then ATR and EXT in ISCII91 have no unicode equivalent (they are control
codes intended to switch between diferent indic encodings or tell font
attributes, etc. I ignore to which extend they are really used)

BTW, do you know if it could be possible to have in gconv() of glibc
a multibyte conversion that would allow converting between ISCII-DEV
and utf-8 (that is the two chars 0xB3 0xB9 in ISCII-DEV would be
one char 0x0958 in unicode (and vice-versa)) ?
 
Thanks

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]