[Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]




--- Begin Message ---
pancake schrieb:



Begin forwarded message:

From: pancake <pancake youterm com>
Date: February 18, 2010 1:38:12 AM GMT+01:00
To: Sandino Flores Moreno <tigrux gmail com>
Subject: Re: [Vala] uchar and uint8. Same? Different?


This vapi looks wrong. Didn't [] implies a _length variable with no ccodes?

Those two types are the same in practice but not in theory. A uint8 is 8 bit size for sure, but uchar do not specifies the size.

This should depend on the C compiler and platform. I don't know any that uchar is != 8, but this doesn't means they exist.

On Feb 17, 2010, at 10:03 PM, Sandino Flores Moreno <tigrux gmail com> wrote:

Hello.

I noticed that gstreamer has this declaration:

GstBuffer {
uint8 *data;
}

However, the corresponding .vapi has
class Gst.buffer {
uchar[] data;
}

I know uchar and uint8 are the same in C.
And, apparently for vala too.

So, the question is:

Can we assume uint8 is the same than uchar in vala?
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list


------------------------------------------------------------------------

_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list

I do not know if the following might help. Anyway here is some explanation:

UTF-8 (8-bit UCS/Unicode Transformation Format) is a /variable-length/ character encoding!

First one has to remember that ASCII originally only wa a 7 bit character set. UTF8 is backwards compatible with 7 bit ASCII only!

But ASCII was extended to 8 bit with the rise of the IBM-PC. Therefore UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes 0x00 to 0x7F (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.

U-00000000 -- U-0000007F : 0/xxxxxxx/

But the extendet code ASCII is reprensented with numbers (in hexadecimal) 0x00 to 0xFF (0 to 255) and there the problem will begin. If it is greater than 127 then you need to split it into two characters. UTF-8 encoded characters may theoretically be up to six bytes long, however 16-bit BMP characters are only up to three bytes long, where the bytes 0xFE and 0xFF are never used in the UTF-8 encoding. The encoding beyond ASCII 7F is different:

/The Copyright Character in /

/ASCII = 0xA9 = 1010 1001/

/compares to /

/UTF8 = 0xC2 0xA9 = 11000010 10101001/

I do not plan to write a complete tutorial here, but I was involved with the problem in the past. May be some C-code from my past (transfer it to vala) might help:

char *ascii_to_utf8(unsigned char c)
{
       unsigned char *out;
if(c < 128)
       {
               out = (char *)calloc(2, sizeof(char));
               out[0] = c;
               out[1] = '\0';
       }
       else
       {
               out = (char *)calloc(3, sizeof(char));
               out[1] = (c >> 6) | 0xC0;
               out[0] = (c & 0x3F) | 0x80;
               out[2] = '\0';
       }
return out;
}

Might be a good idea to add it to the vala examples?

One more example from http://www.cl.cam.ac.uk/~mgk25/unicode.html <http://www.cl.cam.ac.uk/%7Emgk25/unicode.html> :

#include <stdio.h>
#include <locale.h>

int main()
{
   if (!setlocale(LC_CTYPE, "")) {
     fprintf(stderr, "Can't set the specified locale! "
             "Check LANG, LC_CTYPE, LC_ALL.\n");
     return 1;
   }
   printf("%ls\n", L"Schöne Grüße");
   return 0;
}

There are some special german characters, so one might see the problem of encoding. The comment from the given source:

"Call this program with the locale setting LANG=de_DE and the output will be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output will be in UTF-8. The %ls format specifier in printf calls wcsrtombs in order to convert the wide character argument string into the locale-dependent multi-byte encoding."

Some more settings for experiments:

en_US English - United States

ru_RU Russian for Russia

zh_TW Traditional Chinese for Taiwan

etc.


--- End Message ---


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]