[gimp] app: improve sample text logics for CJK fonts.

From: Jehan <jehanp src gnome org>
To: commits-list gnome org
Cc:
Subject: [gimp] app: improve sample text logics for CJK fonts.
Date: Sun, 15 Nov 2020 20:56:44 +0000 (UTC)
commit b7796b0bfb8735af9c72c3e88360ca6f4e42fd12
Author: Jehan <jehan girinstud io>
Date:   Sun Nov 15 21:07:41 2020 +0100

    app: improve sample text logics for CJK fonts.
    
    First of all, "CJK Unified Ideographs" block should not be the highest
    priority to determine showing an ideograph. Indeed most fonts for a
    Korean and Japanese audience would also contain at least the main
    ideographs. So instead, look first for Korean alphabet (Hangul) and
    Japanese syllabaries to determine if it's a Korean or Japanese-targetted
    font. Only then Chinese.
    Also check Korean before Japanese because most of the Korean fonts I saw
    actually also include Japanese syllabaries (but not the other way
    around).
    This way, it will be much easier for CJK graphists to skim through the
    font list and detect fonts made for the needed language in a glance.
    
    Also modifying the Korean display text. KIYEOK and SSANGKIYEOK were
    obviously chosen because they were the first in the block. But they are
    very bad choice. We hesitated with 가 at first, as it is considered the
    first in the syllabary form (가나다라 is kind of similar to our ABCD).
    But it wouldn't show a form with a second consonant (below) which is a
    good stylistic indication. So we hesitated between 한 (han) and 글
    (geul, which also means text so it's a nice sample), and finally went
    with 한 because of the circle shape in ㅎ (hieut) but also its small
    "hat" which has many stylistic variants. So it's quite a good hint of
    stylistic choices made by a font designer from just the sample box.
    
    Moreover I switched the block check from "Hangul Jamo" to "Hangul
    Syllables" block. "Hangul Jamo" are positional forms of the letters to
    dynamically compose syllables (in particular legacy syllables not in use
    anymore). Though a feature-full Korean font set would design these, it
    is less important than "Hangul Syllables" (pre-composed syllables
    design) or "Hangul Compatibility Jamo" (basically the same letters as
    "Hangul Jamo" but for standalone usage). Also I actually saw some fonts
    made for Korean without "Hangul Syllables" support.
    
    Finally I also added a test for Japanese. I check the Hiragana block
    which is most likely the most basic which has to be in any
    Japanese-targetted font and use 'あ' (a) as sample text, which is the
    first Hiragana syllable and here definitely a good sample text in my
    opinion.
    
    I believe that this can still be improved though. Checking only a single
    block to determine the probable target language is not necessarily
    enough. For instance very complete fonts for Chinese may also design
    Korean and Japanese characters, but will also have most CJK blocks and
    more ideographs (whereas Japanese/Korean will likely have less). Yet
    let's say this is good for now, at least better than before!

 app/text/gimpfont.c | 43 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)
---
diff --git a/app/text/gimpfont.c b/app/text/gimpfont.c
index 485ab9a36e..42dc236b7a 100644
--- a/app/text/gimpfont.c
+++ b/app/text/gimpfont.c
@@ -425,12 +425,40 @@ gimp_font_get_sample_string (PangoContext         *context,
     gint         bit;
     const gchar *sample;
   } scripts[] = {
-    /* Han is first because fonts that support it presumably are primarily
-     * designed for it.
+    /* CJK are first because fonts that support these characters
+     * presumably are primarily designed for it.
+     *
+     * Though not a universal rules, most fonts targetting Korean will
+     * also have the ideographs as they were used historically (and
+     * still in some cases). As for Japanese, ideographs are still
+     * definitely used daily. After all the block is called "*CJK*
+     * (Chinese-Japanese-Korean) Unified Ideographs". So let's give the
+     * Chinese representation less priority towards finding
+     * Korean/Japanese targetting.
+     * Then we prioritize Korean because many fonts for Korean also
+     * design Hiragana/Katakana blocks. On the other hand, I could see
+     * very few (none so far) Japanese fonts designing also Hangul
+     * characters. So if we see both Hiragana and Hangul, we can assume
+     * this is a font for Korean market, whereas Hiragana only is for
+     * Japanese.
+     * XXX: of course we could probably come with a better algorithm to
+     * determine the target language. Probably looking at more than the
+     * existence of a single block would help (since some languages are
+     * spread on several blocks).
      */
     {
-      "hani",                   /* Han Ideographic */
-      59,
+      "hang",                   /* Korean alphabet (Hangul)   */
+      56,                       /* Hangul Syllables block     */
+      "\355\225\234"            /* U+D55C HANGUL SYLLABLE HAN */
+    },
+    {
+      "japa",                   /* Japanese                   */
+      49,                       /* Hiragana block             */
+      "\343\201\202"            /* U+3042 HIRAGANA LETTER A   */
+    },
+    {
+      "hani",                   /* Han Ideographic            */
+      59,                       /* CJK Unified Ideographs     */
       "\346\260\270"            /* U+6C38 "forever". Ed Trager says
                                  * this is a "pan-stroke" often used
                                  * in teaching Chinese calligraphy. It
@@ -577,13 +605,6 @@ gimp_font_get_sample_string (PangoContext         *context,
                                   * U+10D0 GEORGIAN LETTER AN
                                   */
     },
-    {
-      "hang",                   /* Hangul */
-      28,
-      "\341\204\200\341\204\201"/* U+1100 HANGUL CHOSEONG KIYEOK
-                                 * U+1101 HANGUL CHOSEONG SSANGKIYEOK
-                                 */
-    },
     {
       "ethi",                   /* Ethiopic */
       75,
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]