[glib] gconvert: Fix error handling for g_iconv() with unrepresentable chars



commit 8abf3a04e699abd486c4dcaa57977203584acf0e
Author: Philip Withnall <withnall endlessm com>
Date:   Mon Jan 22 12:50:15 2018 +0000

    gconvert: Fix error handling for g_iconv() with unrepresentable chars
    
    The behaviour of upstream iconv() when faced with a character which is
    valid in the input encoding, but not representable in the output
    encoding, is implementation defined:
    
    http://pubs.opengroup.org/onlinepubs/9699919799/
    
    Specifically:
    
       If iconv() encounters a character in the input buffer that is valid,
       but for which an identical character does not exist in the target
       codeset, iconv() shall perform an implementation-defined conversion
       on this character.
    
    This behaviour was being exposed in our g_iconv() wrapper and also in
    g_convert_with_iconv() — but users of g_convert_with_iconv() (both the
    GLib unit tests, and the implementation of g_convert_with_fallback())
    were assuming that iconv() would return EILSEQ if faced with an
    unrepresentable character.
    
    On platforms like NetBSD, this is not the case: NetBSD’s iconv()
    finishes the conversion successfully, and outputs a string containing
    replacement characters. It signals those replacements in its return
    value from iconv(), which is positive (specifically, non-zero) in such a
    case.
    
    Let’s codify the existing assumed behaviour of g_convert_with_iconv(),
    documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if
    faced with an unrepresentable character. As g_iconv() is a thin wrapper
    around iconv(), leave the behaviour there implementation-defined (but
    document it as such).
    
    Signed-off-by: Philip Withnall <withnall endlessm com>
    
    https://bugzilla.gnome.org/show_bug.cgi?id=790698

 glib/gconvert.c |   22 ++++++++++++++++++++++
 glib/gconvert.h |    4 +++-
 2 files changed, 25 insertions(+), 1 deletions(-)
---
diff --git a/glib/gconvert.c b/glib/gconvert.c
index 72909e5..f6ac3ff 100644
--- a/glib/gconvert.c
+++ b/glib/gconvert.c
@@ -264,6 +264,13 @@ g_iconv_open (const gchar  *to_codeset,
  * GLib provides g_convert() and g_locale_to_utf8() which are likely
  * more convenient than the raw iconv wrappers.
  * 
+ * Note that the behaviour of iconv() for characters which are valid in the
+ * input character set, but which have no representation in the output character
+ * set, is implementation defined. This function may return success (with a
+ * positive number of non-reversible conversions as replacement characters were
+ * used), or it may return -1 and set an error such as %EILSEQ, in such a
+ * situation.
+ *
  * Returns: count of non-reversible conversions, or -1 on error
  **/
 gsize 
@@ -371,6 +378,14 @@ close_converter (GIConv cd)
  * character until it knows that the next character is not a mark that
  * could combine with the base character.)
  *
+ * Characters which are valid in the input character set, but which have no
+ * representation in the output character set will result in a
+ * %G_CONVERT_ERROR_ILLEGAL_SEQUENCE error. This is in contrast to the iconv()
+ * specification, which leaves this behaviour implementation defined. Note that
+ * this is the same error code as is returned for an invalid byte sequence in
+ * the input character set. To get defined behaviour for conversion of
+ * unrepresentable characters, use g_convert_with_fallback().
+ *
  * Returns: If the conversion was successful, a newly allocated
  *               nul-terminated string, which must be freed with
  *               g_free(). Otherwise %NULL and @error will be set.
@@ -449,6 +464,13 @@ g_convert_with_iconv (const gchar *str,
              break;
            }
        }
+      else if (err > 0)
+        {
+          /* @err gives the number of replacement characters used. */
+          g_set_error_literal (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
+                               _("Unrepresentable character in conversion input"));
+          have_error = TRUE;
+        }
       else 
        {
          if (!reset)
diff --git a/glib/gconvert.h b/glib/gconvert.h
index ea93006..d0d3721 100644
--- a/glib/gconvert.h
+++ b/glib/gconvert.h
@@ -37,7 +37,9 @@ G_BEGIN_DECLS
  * GConvertError:
  * @G_CONVERT_ERROR_NO_CONVERSION: Conversion between the requested character
  *     sets is not supported.
- * @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input.
+ * @G_CONVERT_ERROR_ILLEGAL_SEQUENCE: Invalid byte sequence in conversion input;
+ *    or the character sequence could not be represented in the target
+ *    character set.
  * @G_CONVERT_ERROR_FAILED: Conversion failed for some reason.
  * @G_CONVERT_ERROR_PARTIAL_INPUT: Partial character sequence at end of input.
  * @G_CONVERT_ERROR_BAD_URI: URI is invalid.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]