Re: [Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
- From: pancake <pancake youterm com>
- To: vala-list gnome org
- Subject: Re: [Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
- Date: Thu, 18 Feb 2010 10:42:08 +0100
For unicode characters you have 'unichar'. uchar doesnt applies to
multibyte characters.
I have been informed that the non8bit char architectures are:
* Digital Equipment Corporation PDP-6/10
* IBM 701/704/709/7090/7094
* UNIVAC 1103/1103A/1105/1100/2200,
The uchar in those boxes are 6 and 9 bit size.
Weird but true :)
So, child! take care when compiling Vala programs on PDP and VAX!
pancake wrote:
------------------------------------------------------------------------
Subject:
Re: [Vala] Fwd: uchar and uint8. Same? Different?
From:
bb <bblochl arcor de>
Date:
Thu, 18 Feb 2010 09:23:14 +0100
To:
pancake <pancake youterm com>
To:
pancake <pancake youterm com>
pancake schrieb:
Begin forwarded message:
From: pancake <pancake youterm com>
Date: February 18, 2010 1:38:12 AM GMT+01:00
To: Sandino Flores Moreno <tigrux gmail com>
Subject: Re: [Vala] uchar and uint8. Same? Different?
This vapi looks wrong. Didn't [] implies a _length variable with no
ccodes?
Those two types are the same in practice but not in theory. A uint8
is 8 bit size for sure, but uchar do not specifies the size.
This should depend on the C compiler and platform. I don't know any
that uchar is != 8, but this doesn't means they exist.
On Feb 17, 2010, at 10:03 PM, Sandino Flores Moreno
<tigrux gmail com> wrote:
Hello.
I noticed that gstreamer has this declaration:
GstBuffer {
uint8 *data;
}
However, the corresponding .vapi has
class Gst.buffer {
uchar[] data;
}
I know uchar and uint8 are the same in C.
And, apparently for vala too.
So, the question is:
Can we assume uint8 is the same than uchar in vala?
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
------------------------------------------------------------------------
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
I do not know if the following might help. Anyway here is some
explanation:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a /variable-length/
character encoding!
First one has to remember that ASCII originally only wa a 7 bit
character set. UTF8 is backwards compatible with 7 bit ASCII only!
But ASCII was extended to 8 bit with the rise of the IBM-PC.
Therefore UCS characters U+0000 to U+007F (ASCII) are encoded simply
as bytes 0x00 to 0x7F (ASCII compatibility). This means that files and
strings which contain only 7-bit ASCII characters have the same
encoding under both ASCII and UTF-8.
U-00000000 -- U-0000007F : 0/xxxxxxx/
But the extendet code ASCII is reprensented with numbers (in
hexadecimal) 0x00 to 0xFF (0 to 255) and there the problem will begin.
If it is greater than 127 then you need to split it into two characters.
UTF-8 encoded characters may theoretically be up to six bytes long,
however 16-bit BMP characters are only up to three bytes long, where
the bytes 0xFE and 0xFF are never used in the UTF-8 encoding. The
encoding beyond ASCII 7F is different:
/The Copyright Character in /
/ASCII = 0xA9 = 1010 1001/
/compares to /
/UTF8 = 0xC2 0xA9 = 11000010 10101001/
I do not plan to write a complete tutorial here, but I was involved
with the problem in the past. May be some C-code from my past
(transfer it to vala) might help:
char *ascii_to_utf8(unsigned char c)
{
unsigned char *out;
if(c < 128)
{
out = (char *)calloc(2, sizeof(char));
out[0] = c;
out[1] = '\0';
}
else
{
out = (char *)calloc(3, sizeof(char));
out[1] = (c >> 6) | 0xC0;
out[0] = (c & 0x3F) | 0x80;
out[2] = '\0';
}
return out;
}
Might be a good idea to add it to the vala examples?
One more example from http://www.cl.cam.ac.uk/~mgk25/unicode.html
<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html> :
#include <stdio.h>
#include <locale.h>
int main()
{
if (!setlocale(LC_CTYPE, "")) {
fprintf(stderr, "Can't set the specified locale! "
"Check LANG, LC_CTYPE, LC_ALL.\n");
return 1;
}
printf("%ls\n", L"Schöne Grüße");
return 0;
}
There are some special german characters, so one might see the problem
of encoding. The comment from the given source:
"Call this program with the locale setting LANG=de_DE and the output
will be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output
will be in UTF-8. The %ls format specifier in printf calls wcsrtombs
in order to convert the wide character argument string into the
locale-dependent multi-byte encoding."
Some more settings for experiments:
en_US English - United States
ru_RU Russian for Russia
zh_TW Traditional Chinese for Taiwan
etc.
------------------------------------------------------------------------
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]