[glib] gconvert: Fix error handling for g_iconv() with unrepresentable chars
- From: Philip Withnall <pwithnall src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [glib] gconvert: Fix error handling for g_iconv() with unrepresentable chars
- Date: Fri, 2 Feb 2018 09:04:56 +0000 (UTC)
commit 8abf3a04e699abd486c4dcaa57977203584acf0e
Author: Philip Withnall <withnall endlessm com>
Date: Mon Jan 22 12:50:15 2018 +0000
gconvert: Fix error handling for g_iconv() with unrepresentable chars
The behaviour of upstream iconv() when faced with a character which is
valid in the input encoding, but not representable in the output
encoding, is implementation defined:
http://pubs.opengroup.org/onlinepubs/9699919799/
Specifically:
If iconv() encounters a character in the input buffer that is valid,
but for which an identical character does not exist in the target
codeset, iconv() shall perform an implementation-defined conversion
on this character.
This behaviour was being exposed in our g_iconv() wrapper and also in
g_convert_with_iconv() — but users of g_convert_with_iconv() (both the
GLib unit tests, and the implementation of g_convert_with_fallback())
were assuming that iconv() would return EILSEQ if faced with an
unrepresentable character.
On platforms like NetBSD, this is not the case: NetBSD’s iconv()
finishes the conversion successfully, and outputs a string containing
replacement characters. It signals those replacements in its return
value from iconv(), which is positive (specifically, non-zero) in such a
case.
Let’s codify the existing assumed behaviour of g_convert_with_iconv(),
documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if
faced with an unrepresentable character. As g_iconv() is a thin wrapper
around iconv(), leave the behaviour there implementation-defined (but
document it as such).
Signed-off-by: Philip Withnall <withnall endlessm com>
https://bugzilla.gnome.org/show_bug.cgi?id=790698
glib/gconvert.c | 22 ++++++++++++++++++++++
glib/gconvert.h | 4 +++-
2 files changed, 25 insertions(+), 1 deletions(-)
---
diff --git a/glib/gconvert.c b/glib/gconvert.c
index 72909e5..f6ac3ff 100644
--- a/glib/gconvert.c
+++ b/glib/gconvert.c
@@ -264,6 +264,13 @@ g_iconv_open (const gchar *to_codeset,
* GLib provides g_convert() and g_locale_to_utf8() which are likely
* more convenient than the raw iconv wrappers.
*
+ * Note that the behaviour of iconv() for characters which are valid in the
+ * input character set, but which have no representation in the output character
+ * set, is implementation defined. This function may return success (with a
+ * positive number of non-reversible conversions as replacement characters were
+ * used), or it may return -1 and set an error such as %EILSEQ, in such a
+ * situation.
+ *
* Returns: count of non-reversible conversions, or -1 on error
**/
gsize
@@ -371,6 +378,14 @@ close_converter (GIConv cd)
* character until it knows that the next character is not a mark that
* could combine with the base character.)
*
+ * Characters which are valid in the input character set, but which have no
+ * representation in the output character set will result in a
+ * %G_CONVERT_ERROR_ILLEGAL_SEQUENCE error. This is in contrast to the iconv()
+ * specification, which leaves this behaviour implementation defined. Note that
+ * this is the same error code as is returned for an invalid byte sequence in
+ * the input character set. To get defined behaviour for conversion of
+ * unrepresentable characters, use g_convert_with_fallback().
+ *
* Returns: If the conversion was successful, a newly allocated
* nul-terminated string, which must be freed with
* g_free(). Otherwise %NULL and @error will be set.
@@ -449,6 +464,13 @@ g_convert_with_iconv (const gchar *str,
break;
}
}
+ else if (err > 0)
+ {
+ /* @err gives the number of replacement characters used. */
+ g_set_error_literal (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
+ _("Unrepresentable character in conversion input"));
+ have_error = TRUE;
+ }
else
{
if (!reset)
diff --git a/glib/gconvert.h b/glib/gconvert.h
index ea93006..d0d3721 100644
--- a/glib/gconvert.h
+++ b/glib/gconvert.h
@@ -37,7 +37,9 @@ G_BEGIN_DECLS
* GConvertError:
* @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
* sets is not supported.
- * @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
+ * @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input;
+ * or the character sequence could not be represented in the target
+ * character set.
* @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
* @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
* @G_CONVERT_ERROR_BAD_URI: URI is invalid.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]