Re: int (utf-8) to Glib::ustring
- From: Spazzatura.Live <kharhonte hotmail com>
- To: Kjell Ahlstedt <kjell ahlstedt bredband net>
- Cc: gtkmm-list gnome org
- Subject: Re: int (utf-8) to Glib::ustring
- Date: Sat, 3 Dec 2011 15:29:44 +0100
char *utf8hex_to_str(const char *utf8hex)
{
const size_t utf8_char_size_t = strlen(utf8hex)/2 + 1;
char *utf8_char = new char[utf8_char_size_t];
int utf8int;
sscanf(utf8hex, "%8x", &utf8int);
utf8_char[utf8_char_size_t-1] = '\0';
for(size_t i=0;i<utf8_char_size_t-1;i++)
{
utf8_char[i] = (char)(utf8int >> (8*(utf8_char_size_t-i-2)));
}
return utf8_char;
}
This should work either for 4, 3, 2 or 1 utf8 hex char.
Il 02/12/2011 23:59, Spazzatura.Live ha scritto:
First of all, i want really to thank you very much, because the gtkmm
documentation is really poor and the library itself lacks of a lot of
necessary functions; this morning, i was going to read all the
documentation of Unicode, utf8 and utf16 to work with raw data and do
the conversion functions by myself, you saved me from this hard work.
I'm reading the Thunderbird addressbook and i'm creating a mork parser
(because the one i found didn't recognized the vocal with accent (and
now i know why...)). In the mork file, the "non latin base" character
are written in utf8 hex (like ò=$C3$B2) so i was getting mad to know
how to convert it (without using a really "un-chic" and bad functional
table).
However it works like i wanted to. Thanks again.
Regards,
Stefano
Il 02/12/2011 17:08, Kjell Ahlstedt ha scritto:
2011-12-02 10:02, Spazzatura.Live skrev:
Hi everyone.
I have an hex representation (in a string) of a 2 byte UTF-8 char.
When i try to use g_unichar_to_utf8(), it converts a UTF-16 value to
a gchar *, so:
1) How can i convert UTF-8 to UTF-16 value?
OR
2) How can i convert a UTF-8 value into std::string, Glib::ustring,
gchar * or char *?
Do you mean that you have a 4-byte string, e.g. "C384", that
represents a 2-byte UTF-8 character (in this case 0xc3,0x84 = U+00C4
= Umlaut-A = Ä)?
I don't know of any function that converts easily from the 4-byte
string to the 2-byte string you need. Perhaps you have to use sscanf().
const char hexstring[] = "C384"; // Perhaps read from a file
char utf8char[3];
int utf8int;
sscanf(hexstring, "%4x", &utf8int);
utf8char[0] = static_cast<char>(utf8int >> 8);
utf8char[1] = static_cast<char>(utf8int);
utf8char[2] = '\0';
This seems fairly complicated for such a simple task. Perhaps someone
knows of a function that does this conversion in one step?
Now you have your UTF-8 character in a char*. gchar is a typedef for
char, so you could just as well have declared
gchar utf8char[3];
to get a gchar*. The conversions to Glib::ustring and std::string are
easy, e.g.
Glib::ustring str(utf8char);
std::string str(utf8char);
To convert from UTF-8 to UTF-16, use g_utf8_to_utf16(), e.g.
gunichar2* utf16 = g_utf8_to_utf16(utf8char, -1, 0, 0, 0);
Or have I misunderstood you? In the subject line you mention "int
(utf-8)", but in the first line of your message you talk about a string.
If you start with an int, e.g.
const int utf8int = 0xc384;
then you skip the call to sscanf().
If utf8int is your starting point, you have a UTF-8 value stored in
an unusual way. You should perhaps check if you can avoid getting
there in the first place.
_______________________________________________
gtkmm-list mailing list
gtkmm-list gnome org
http://mail.gnome.org/mailman/listinfo/gtkmm-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]