UTF-8 Functions
- From: Owen Taylor <otaylor redhat com>
- To: gtk-devel-list gnome org
- Cc: gtk-i18n-list gnome org, trow ximian com, darin bentspoon com
- Subject: UTF-8 Functions
- Date: 01 Jul 2001 21:17:02 -0400
I've now added the following functions to GLib. I'm pretty happy with
them as encapsulating the basic operations of this nature in a simple
manner.
The main change I'm considering at this point is to add a max_len (can
be -1) parameter to normalize(), casefold(), strup(), strdown(), and
collate_key(). It makes things just a little bit more complicated, but
I hate having to g_strndup() a portion of a string, do something, and
then free the dup'ed string immediately.
(If you look inside the implementations, you'll see that this is
a convenience concern, not an efficiency concern at this point!)
But if people have other easy-to-implement improvements, I'd be
happy to consider them as well.
Regards,
Owen
/**
* g_utf8_normalize:
* @str: a UTF-8 encoded string.
* @mode: the type of normalization to perform.
*
* Convert a string into canonical form, standardizing
* such issues as whether a character with an accent
* is represented as a base character and combining
* accent or as a single precomposed characters. You
* should generally call g_utf8_normalize before
* comparing two Unicode strings.
*
* The normalization mode %G_NORMALIZE_DEFAULT only
* standardizes differences that do not affect the
* text content, such as the above-mentioned accent
* representation. %G_NORMALIZE_ALL also standardizes
* the "compatibility" characters in Unicode, such
* as SUPERSCRIPT THREE to the standard forms
* (in this case DIGIT THREE). Formatting information
* may be lost but for most text operations such
* characters should be considered the same.
* For example, g_utf8_collate() normalizes
* with %G_NORMALIZE_ALL as its first step.
*
* %G_NORMALIZE_DEFAULT_COMPOSE and %G_NORMALIZE_ALL_COMPOSE
* are like %G_NORMALIZE_DEFAULT and %G_NORMALIZE_ALL,
* but returned a result with composed forms rather
* than a maximally decomposed form. This is often
* useful if you intend to convert the string to
* a legacy encoding or pass it to a system with
* less capable Unicode handling.
*
* Return value: the string in normalized form
**/
gchar *g_utf8_normalize (const gchar *str,
GNormalizeMode mode);
/**
* g_ut8f_strdown:
* @string: a UTF-8 encoded string
*
* Converts all Unicode characters in the string that have a case
* to lowercase. The exact manner that this is done depends
* on the current locale, and may result in the number of
* characters in the string changing.
*
* Return value: a newly allocated string, with all characters
* converted to lowercase.
**/
gchar *g_utf8_strdown (const gchar *str);
/**
* g_ut8f_strup:
* @string: a UTF-8 encoded string
*
* Converts all Unicode characters in the string that have a case
* to uppercase. The exact manner that this is done depends
* on the current locale, and may result in the number of
* characters in the string increasing. (For instance, the
* German ess-zet will be changed to SS.)
*
* Return value: a newly allocated string, with all characters
* converted to uppercase.
**/
gchar *g_utf8_strup (const gchar *str);
/**
* g_utf8_casefold:
* @str: a UTF-8 encoded string
*
* Converts a string into a form that is independent of case. The
* result will not correspond to any particular case, but can be
* compared for equality or ordered with the results of calling
* g_utf8_casefold() on other strings.
*
* Note that calling g_utf8_casefold() followed by g_utf8_collate() is
* only an approximation to the correct linguistic case insensitive
* ordering, though it is a fairly good one. Getting this exactly
* right would require a more sophisticated collation function that
* takes case sensitivity into account. GLib does not currently
* provide such a function.
*
* Return value: a newly allocated string, that is a
* case independent form of @str.
**/
gchar *g_utf8_casefold (const gchar *str);
/**
* g_utf8_collate:
* @str1: a UTF-8 encoded string
* @str2: a UTF-8 encoded string
*
* Compares two strings for ordering using the linguistically
* correct rules for the current locale. When sorting a large
* number of strings, it will be significantly faster to
* obtain collation keys with g_utf8_collate_key() and
* compare the keys with strcmp() when sorting instead of
* sorting the original strings.
*
* Return value: -1 if str1 compares before str2, 0 if they
* compare equal, 1 if str1 compares after str2.
**/
gint g_utf8_collate (const gchar *str1, const gchar *str2);
/**
* g_utf8_collate_key:
* @str: a UTF-8 encoded string.
*
* Converts a string into a collation key that can be compared
* with other collation keys using strcmp(). The results of
* comparing the collation keys of two strings with strcmp()
* will always be the same as comparing the two original
* keys with g_utf8_collate().
*
* Return value: a newly allocated string. This string should
* be freed with g_free when you are done with it.
**/
gchar *g_utf8_collate_key (const gchar *str);
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]