[Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]

From: pancake <pancake youterm com>
To: vala-list gnome org
Subject: [Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
Date: Thu, 18 Feb 2010 10:32:37 +0100

--- Begin Message ---

From: bb <bblochl arcor de>

To: pancake <pancake youterm com>

Subject: Re: [Vala] Fwd: uchar and uint8. Same? Different?

Date: Thu, 18 Feb 2010 09:23:14 +0100
pancake schrieb:
Begin forwarded message:
From: pancake <pancake youterm com>
Date: February 18, 2010 1:38:12 AM GMT+01:00
To: Sandino Flores Moreno <tigrux gmail com>
Subject: Re: [Vala] uchar and uint8. Same? Different?
This vapi looks wrong. Didn't [] implies a _length variable with noccodes?
Those two types are the same in practice but not in theory. A uint8is 8 bit size for sure, but uchar do not specifies the size.
This should depend on the C compiler and platform. I don't know anythat uchar is != 8, but this doesn't means they exist.
On Feb 17, 2010, at 10:03 PM, Sandino Flores Moreno<tigrux gmail com> wrote:
Hello.

I noticed that gstreamer has this declaration:

GstBuffer {
uint8 *data;
}

However, the corresponding .vapi has
class Gst.buffer {
uchar[] data;
}

I know uchar and uint8 are the same in C.
And, apparently for vala too.

So, the question is:

Can we assume uint8 is the same than uchar in vala?
_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
------------------------------------------------------------------------

_______________________________________________
Vala-list mailing list
Vala-list gnome org
http://mail.gnome.org/mailman/listinfo/vala-list
I do not know if the following might help. Anyway here is some explanation:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a /variable-length/character encoding!
First one has to remember that ASCII originally only wa a 7 bitcharacter set. UTF8 is backwards compatible with 7 bit ASCII only!
But ASCII was extended to 8 bit with the rise of the IBM-PC. ThereforeUCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes 0x00to 0x7F (ASCII compatibility). This means that files and strings whichcontain only 7-bit ASCII characters have the same encoding under bothASCII and UTF-8.
U-00000000 -- U-0000007F : 0/xxxxxxx/
But the extendet code ASCII is reprensented with numbers (inhexadecimal) 0x00 to 0xFF (0 to 255) and there the problem will begin.If it is greater than 127 then you need to split it into two characters.UTF-8 encoded characters may theoretically be up to six bytes long,however 16-bit BMP characters are only up to three bytes long, where thebytes 0xFE and 0xFF are never used in the UTF-8 encoding. The encodingbeyond ASCII 7F is different:
/The Copyright Character in /

/ASCII = 0xA9 = 1010 1001/

/compares to /

/UTF8 = 0xC2 0xA9 = 11000010 10101001/
I do not plan to write a complete tutorial here, but I was involved withthe problem in the past. May be some C-code from my past (transfer it tovala) might help:
char *ascii_to_utf8(unsigned char c)
{
       unsigned char *out;
if(c < 128)
       {
               out = (char *)calloc(2, sizeof(char));
               out[0] = c;
               out[1] = '\0';
       }
       else
       {
               out = (char *)calloc(3, sizeof(char));
               out[1] = (c >> 6) | 0xC0;
               out[0] = (c & 0x3F) | 0x80;
               out[2] = '\0';
       }
return out;
}

Might be a good idea to add it to the vala examples?
One more example from http://www.cl.cam.ac.uk/~mgk25/unicode.html<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html> :
#include <stdio.h>
#include <locale.h>

int main()
{
   if (!setlocale(LC_CTYPE, "")) {
     fprintf(stderr, "Can't set the specified locale! "
             "Check LANG, LC_CTYPE, LC_ALL.\n");
     return 1;
   }
   printf("%ls\n", L"Schöne Grüße");
   return 0;
}
There are some special german characters, so one might see the problemof encoding. The comment from the given source:
"Call this program with the locale setting LANG=de_DE and the outputwill be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output willbe in UTF-8. The %ls format specifier in printf calls wcsrtombs in orderto convert the wide character argument string into the locale-dependentmulti-byte encoding."
Some more settings for experiments:

en_US English - United States

ru_RU Russian for Russia

zh_TW Traditional Chinese for Taiwan

etc.
--- End Message ---

Follow-Ups:
- Re: [Vala] [Fwd: Re: Fwd: uchar and uint8. Same? Different?]
  - From: pancake

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]