Re: Looking for info on developing pango shapers for indian languages
- From: Pablo Saratxaga <pablo mandrakesoft com>
- To: gtk-i18n-list gnome org
- Subject: Re: Looking for info on developing pango shapers for indian languages
- Date: Mon, 16 Oct 2000 17:22:50 +0200
Kaixo!
On Mon, Oct 16, 2000 at 01:51:08PM +0100, Robert Brady wrote:
> > Doesn't the ISCII standard yhas some provision for that ?
>
> I'm not sure exactly.
>
> > I know that is the case for the Hindi vs Mahrati half form of ra
> > (the Mahrati one is encoded ra + nukta + virama)
>
> There are certainly some consonant which should make a conjunct in
> Sanskrit, but just use a regular halfform/form ligature in Hindi.
Reading the docs I have, I see that ISCII has a provision for the oposite
case (conjuncts are used in Hindi but in Sanskrit and Veda texts the previous
consoun keeps its normal form with a visible virama sign under it.
Two consecutive virama signs are used for that.
There is also the "soft halant" (soft virama) that is encoded as a virama
sign folowed by a nukta sign; the result is that the previous letter keeps
a half form and don't combinate with the following letter (even when a
combination would be possible).
The document I have is an English translation of ISCII91 you can find at
http://cdac.org.in/html/gist/iscii91.pdf
Note that the ISCII91 charsets are not the same as those in unicode;
however you can easily make an unambiguous multibyte conversion
between them (the lettes with a nukta have a codepoint in unicode, but
are decomposed in base letter + nukta in iscii91; the reason is that
iscii91 defines an 8bit encoding; that is limited in space.
There also a few non-letter chars in unicode not in iscii91:
<U0951> DEVANAGARI STRESS SIGN UDATTA
<U0952> DEVANAGARI STRESS SIGN ANUDATTA
<U0953> DEVANAGARI GRAVE ACCENT
<U0954> DEVANAGARI ACUTE ACCENT
<U0970> DEVANAGARI ABBREVIATION SIGN
<U0965> DEVANAGARI DOUBLE DANDA [*]
[*]: can be encoded as 2 DANDA
Then ATR and EXT in ISCII91 have no unicode equivalent (they are control
codes intended to switch between diferent indic encodings or tell font
attributes, etc. I ignore to which extend they are really used)
BTW, do you know if it could be possible to have in gconv() of glibc
a multibyte conversion that would allow converting between ISCII-DEV
and utf-8 (that is the two chars 0xB3 0xB9 in ISCII-DEV would be
one char 0x0958 in unicode (and vice-versa)) ?
Thanks
--
Ki ça vos våye bén,
Pablo Saratxaga
http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]