Re: Exculding Hangul compatibility jamo (U+3130 - U+318f) from the basic shaper?



Owen Taylor <otaylor redhat com> writes:

> Changwoo Ryu <cwryu debian org> writes:
> 
> > Owen Taylor <otaylor redhat com> writes:
> > 
> > > Changwoo Ryu <cwryu debian org> writes:
> > > 
> > > > I completed the table.  This _huge_ patch enables ksc5601 fonts in the
> > > > Hangul compatibility jamos and the CJK unified ideographs area.
> > > > 
> > > > I'll commit myself if no one objects.
> > > 
> > > I'm sorry I didn't respond earlier.... how did you generate this patch?
> > > 
> > > tables-big.i is an autogenerated file, and we need ot be able to regenerate
> > > it later as necessary.
> > > 
> > > Regards,
> > 
> > The patch was generated half-automatically..  I marked every char as
> > ksc5601-printable if iconv(from WCHAR_T to EUC-KR) successes on the
> > char.  Then I replaced all the marked chars with the char_mask_map[]
> > indices, which I added by hand.
> > 
> > Where is the program you used to generate the table?  I guess it could
> > be easily modified.
> 
> The table is generated by:
> 
>  pango/tools/make-table.sh
>  pango/tools/compress-table.pl
> 
> The source tables are described in:
> 
>  http://mail.gnome.org/archives/gtk-i18n-list/2000-August/msg00016.html
> 
> And, with further additions in:
> 
>  http://bugzilla.gnome.org/show_bug.cgi?id=50633
> 
> (See comment from 2001-08-06)
> 
> Since the EASTASIAN tables on unicode.org are considered obsolete, it
> might be best to switch make-table.sh to be able to read Unihan.txt
> in addition. Or, maybe, we should add another script that takes Unihan.txt
> and creates tables in the format make-table.sh expects.

I just wrote the below script which can generate make-table.sh's input
from Unihan.txt.  It seemed to work (the local encoding values it
produces are not correct but anyway they will be ignored by
compress-table.pl).  But Unihan.txt does NOT seem to have all the
informations the obsolete mapping files have.

It doesn't have jis-0201 mappings.  furthermore the ksc-5601 mapping
only includes the Hanja area, with no Hangul compatibility jamos or
Hangul syllables.


-- 
Changwoo Ryu

Cc'ing is encouraged


#!/usr/bin/perl -w

use IO::File;

%maps = ( 'kGB0' => 'gb-2312',
	  'kJis0' => 'jis-0208',
	  'kJis1' => 'jis-0212',
	  'kJIS0213' => 'jis-0212',
	  'kKSC0' => 'ksc-5601',
	  'kBigFive' => 'big5' );


open(UNIHAN, "<Unihan.txt") || die "Can't open Unihan.txt";
%fhandles = ();
for $m (keys %maps) {
    $fhandles{$maps{$m}} = new IO::File;
    open ($fhandles{$maps{$m}}, ">maps/$maps{$m}") || die "Can't open $maps{$m}";
}

while (<UNIHAN>) {
#     if (/^U\+([0-9a-fA-F]+)\s+kIRG_([A-Z])Source\s+([0-9A-Z])-([0-9A-Z]+)/) {
# 	($u, $tag, $l) = ($1, "$2:$3", $4);
#     }
#     els
    if (/^U\+([0-9a-fA-F]+)\s+([0-9a-zA-Z]+)\s+([0-9A-F]+)\s*$/) {
	($u, $tag, $l) = ($1, $2, $3);
    }
    else {
	next;
    }
    if ($maps{$tag}) {
	$e = $maps{$tag};
	$fhandles{$e}->print("0x$l\t0x$u\n");
    }
}
for $m (keys %maps) {
    close($fhandles{$maps{$m}});
}

close UNIHAN;



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]