Re: int (utf-8) to Glib::ustring



First of all, i want really to thank you very much, because the gtkmm documentation is really poor and the library itself lacks of a lot of necessary functions; this morning, i was going to read all the documentation of Unicode, utf8 and utf16 to work with raw data and do the conversion functions by myself, you saved me from this hard work.

I'm reading the Thunderbird addressbook and i'm creating a mork parser (because the one i found didn't recognized the vocal with accent (and now i know why...)). In the mork file, the "non latin base" character are written in utf8 hex (like ò=$C3$B2) so i was getting mad to know how to convert it (without using a really "un-chic" and bad functional table).

However it works like i wanted to. Thanks again.

Regards,
Stefano

Il 02/12/2011 17:08, Kjell Ahlstedt ha scritto:
2011-12-02 10:02, Spazzatura.Live skrev:
Hi everyone.

I have an hex representation (in a string) of a 2 byte UTF-8 char.

When i try to use g_unichar_to_utf8(), it converts a UTF-16 value to a gchar *, so:

1) How can i convert UTF-8 to UTF-16 value?

OR

2) How can i convert a UTF-8 value into std::string, Glib::ustring, gchar * or char *?

Do you mean that you have a 4-byte string, e.g. "C384", that represents a 2-byte UTF-8 character (in this case 0xc3,0x84 = U+00C4 = Umlaut-A = Ä)?

I don't know of any function that converts easily from the 4-byte string to the 2-byte string you need. Perhaps you have to use sscanf().

const char hexstring[] = "C384"; // Perhaps read from a file
char utf8char[3];
int utf8int;

sscanf(hexstring, "%4x", &utf8int);
utf8char[0] = static_cast<char>(utf8int >> 8);
utf8char[1] = static_cast<char>(utf8int);
utf8char[2] = '\0';

This seems fairly complicated for such a simple task. Perhaps someone knows of a function that does this conversion in one step?

Now you have your UTF-8 character in a char*. gchar is a typedef for char, so you could just as well have declared
  gchar utf8char[3];
to get a gchar*. The conversions to Glib::ustring and std::string are easy, e.g.
  Glib::ustring str(utf8char);
  std::string str(utf8char);

To convert from UTF-8 to UTF-16, use g_utf8_to_utf16(), e.g.
  gunichar2* utf16 = g_utf8_to_utf16(utf8char, -1, 0, 0, 0);

Or have I misunderstood you? In the subject line you mention "int (utf-8)", but in the first line of your message you talk about a string.
If you start with an int, e.g.
  const int utf8int = 0xc384;
then you skip the call to sscanf().

If utf8int is your starting point, you have a UTF-8 value stored in an unusual way. You should perhaps check if you can avoid getting there in the first place.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]