g_locale_to_utf8() chokes on =?iso-8859-1?q?'=F6'?=



To get round this problem, here is a little function which will
convert a string of bytes with the values 1 to 255 into a UTF-8 string
which will pass muster (g_utf8_validate() accepts it).

static void
utf8_cnvt(gchar * source, int source_lgth, gchar * dest)
{
  gchar *p, *q;
  int i;

  if (source_lgth == -1)
    source_lgth = strlen(source);

  for (p = source, q = dest, i = 0; i < source_lgth; p++, i++)
    if (isascii(*p)) {
      *q++ = *p;
    } else {
      *q++ = 0xc0 | (((guchar) (*p) >> 6) & 0x03);
      *q++ = 0x80 | ((guchar) (*p) & 0x3f);
    }
  *q = '\0';
}

The conversion comes from the comments to Naoto Takahashi's utf-8.el
(in emacs-21.1 and 21.2), the relevant bit of the comment is:

;; UTF-8 is defined in RFC 2279.  A sketch of the encoding is:

;;        scalar       |               utf-8
;;        value        | 1st byte  | 2nd byte  | 3rd byte
;; --------------------+-----------+-----------+----------
;; 0000 0000 0xxx xxxx | 0xxx xxxx |           |
;; 0000 0yyy yyxx xxxx | 110y yyyy | 10xx xxxx |
;; zzzz yyyy yyxx xxxx | 1110 zzzz | 10yy yyyy | 10xx xxxx

This mailing list has been very helpful, perhaps this will be a little
repayment.

There is no copyright on this code, do with it what you will, offer it
for sale on EBay if you like ;)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]