[Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
- From: pancake <pancake youterm com>
- To: vala-list gnome org
- Subject: [Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
- Date: Thu, 18 Feb 2010 10:32:37 +0100
--- Begin Message ---
- From: bb <bblochl arcor de>
- To: pancake <pancake youterm com>
- Subject: Re: [Vala] Fwd: uchar and uint8. Same? Different?
- Date: Thu, 18 Feb 2010 09:23:14 +0100
pancake schrieb:
Begin forwarded message:
From: pancake <pancake youterm com>
Date: February 18, 2010 1:38:12 AM GMT+01:00
To: Sandino Flores Moreno <tigrux gmail com>
Subject: Re: [Vala] uchar and uint8. Same? Different?
This vapi looks wrong. Didn't [] implies a _length variable with no
ccodes?
Those two types are the same in practice but not in theory. A uint8
is 8 bit size for sure, but uchar do not specifies the size.
This should depend on the C compiler and platform. I don't know any
that uchar is != 8, but this doesn't means they exist.
On Feb 17, 2010, at 10:03 PM, Sandino Flores Moreno
<tigrux gmail com> wrote:
Hello.
I noticed that gstreamer has this declaration:
GstBuffer {
uint8 *data;
}
However, the corresponding .vapi has
class Gst.buffer {
uchar[] data;
}
I know uchar and uint8 are the same in C.
And, apparently for vala too.
So, the question is:
Can we assume uint8 is the same than uchar in vala?
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
------------------------------------------------------------------------
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
I do not know if the following might help. Anyway here is some explanation:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a /variable-length/
character encoding!
First one has to remember that ASCII originally only wa a 7 bit
character set. UTF8 is backwards compatible with 7 bit ASCII only!
But ASCII was extended to 8 bit with the rise of the IBM-PC. Therefore
UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes 0x00
to 0x7F (ASCII compatibility). This means that files and strings which
contain only 7-bit ASCII characters have the same encoding under both
ASCII and UTF-8.
U-00000000 -- U-0000007F : 0/xxxxxxx/
But the extendet code ASCII is reprensented with numbers (in
hexadecimal) 0x00 to 0xFF (0 to 255) and there the problem will begin.
If it is greater than 127 then you need to split it into two characters.
UTF-8 encoded characters may theoretically be up to six bytes long,
however 16-bit BMP characters are only up to three bytes long, where the
bytes 0xFE and 0xFF are never used in the UTF-8 encoding. The encoding
beyond ASCII 7F is different:
/The Copyright Character in /
/ASCII = 0xA9 = 1010 1001/
/compares to /
/UTF8 = 0xC2 0xA9 = 11000010 10101001/
I do not plan to write a complete tutorial here, but I was involved with
the problem in the past. May be some C-code from my past (transfer it to
vala) might help:
char *ascii_to_utf8(unsigned char c)
{
unsigned char *out;
if(c < 128)
{
out = (char *)calloc(2, sizeof(char));
out[0] = c;
out[1] = '\0';
}
else
{
out = (char *)calloc(3, sizeof(char));
out[1] = (c >> 6) | 0xC0;
out[0] = (c & 0x3F) | 0x80;
out[2] = '\0';
}
return out;
}
Might be a good idea to add it to the vala examples?
One more example from http://www.cl.cam.ac.uk/~mgk25/unicode.html
<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html> :
#include <stdio.h>
#include <locale.h>
int main()
{
if (!setlocale(LC_CTYPE, "")) {
fprintf(stderr, "Can't set the specified locale! "
"Check LANG, LC_CTYPE, LC_ALL.\n");
return 1;
}
printf("%ls\n", L"Schöne Grüße");
return 0;
}
There are some special german characters, so one might see the problem
of encoding. The comment from the given source:
"Call this program with the locale setting LANG=de_DE and the output
will be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output will
be in UTF-8. The %ls format specifier in printf calls wcsrtombs in order
to convert the wide character argument string into the locale-dependent
multi-byte encoding."
Some more settings for experiments:
en_US English - United States
ru_RU Russian for Russia
zh_TW Traditional Chinese for Taiwan
etc.
--- End Message ---
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]