Re: int (utf-8) to Glib::ustring



2011-12-02 10:02, Spazzatura.Live skrev:
Hi everyone.

I have an hex representation (in a string) of a 2 byte UTF-8 char.

When i try to use g_unichar_to_utf8(), it converts a UTF-16 value to a gchar *, so:

1) How can i convert UTF-8 to UTF-16 value?

OR

2) How can i convert a UTF-8 value into std::string, Glib::ustring, gchar * or char *?

Do you mean that you have a 4-byte string, e.g. "C384", that represents a 2-byte UTF-8 character (in this case 0xc3,0x84 = U+00C4 = Umlaut-A = Ä)?

I don't know of any function that converts easily from the 4-byte string to the 2-byte string you need. Perhaps you have to use sscanf().

const char hexstring[] = "C384"; // Perhaps read from a file
char utf8char[3];
int utf8int;

sscanf(hexstring, "%4x", &utf8int);
utf8char[0] = static_cast<char>(utf8int >> 8);
utf8char[1] = static_cast<char>(utf8int);
utf8char[2] = '\0';

This seems fairly complicated for such a simple task. Perhaps someone knows of a function that does this conversion in one step?

Now you have your UTF-8 character in a char*. gchar is a typedef for char, so you could just as well have declared
  gchar utf8char[3];
to get a gchar*. The conversions to Glib::ustring and std::string are easy, e.g.
  Glib::ustring str(utf8char);
  std::string str(utf8char);

To convert from UTF-8 to UTF-16, use g_utf8_to_utf16(), e.g.
  gunichar2* utf16 = g_utf8_to_utf16(utf8char, -1, 0, 0, 0);

Or have I misunderstood you? In the subject line you mention "int (utf-8)", but in the first line of your message you talk about a string.
If you start with an int, e.g.
  const int utf8int = 0xc384;
then you skip the call to sscanf().

If utf8int is your starting point, you have a UTF-8 value stored in an unusual way. You should perhaps check if you can avoid getting there in the first place.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]