Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
- From: Steve Williams <swilliams rinax com>
- To: xml gnome org
- Subject: Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
- Date: Thu, 01 Aug 2002 16:34:41 -0600
Hi,
Thanks for the assistance to date, it is much appreciated. One of the
joys of having the source is that I can look and see exactly how the
routine works.
As it stands now, isolat1ToUTF8 will never return an error. (there's
only one return statement, and it's a "0").
So, I figure if I check the number of bytes processed in the input
stream against the length of the input stream would work fine as a
verification that the entire input string was processed.
Here's a sample of code I threw togeter.... does anything glaring wrong
jump out at anyone??
#include <tree.h>
/* 80 character input buffer */
#define ISOLAT1BUFF 80
/* isolat1ToUTF8 shouldn't more than double (worst case) */
#define UTF8DEFAULT 2*ISOLAT1BUFF
main()
{
int tmp;
char * isolat1Tag="test conversion";
int isolat1TagLength;
xmlChar *utf8Tag = xmlMalloc( UTF8DEFAULT * sizeof(utf8Tag));
int utf8TagLength = UTF8DEFAULT;
/* Let's take the input tag and convert from an assumed format of
* ISO-8559-1 (iso-latin-1) to the internal representation UTF-8
*/
tmp = isolat1TagLength = strlen(isolat1Tag);
if (!isolat1ToUTF8(utf8Tag, &utf8TagLength, isolat1Tag,
&isolat1TagLength)){
/* It worked, let's make sure that it processed the whole input
string */
if (isolat1TagLength != tmp){
fprintf(stderr, "isolat1ToUTF8 could not process entire TAG\n");
exit(1);
}
} else {
fprintf(stderr, "isolat1ToUTF8 returned an error!!\n");
exit(1);
}
/* null terminate the UTF8 string */
*(utf8Tag+utf8TagLength)='\0';
exit(0);
}
Daniel Veillard wrote:
On Thu, Aug 01, 2002 at 08:53:47PM +0200, Hannu Krosing wrote:
On Thu, 2002-08-01 at 10:49, Daniel Veillard wrote:
French characters (well accented ones for example) are not in ASCII
but probably in ISO Latin 1, use the following function to convert them
before passing the strings to the API:
int
isolat1ToUTF8(unsigned char* out, int *outlen,
const unsigned char* in, int *inlen) {
Why is it "unsigned char* out" and not "xmlChar* out" ?
for conformance with iconv() interfaces
Or are they actually made the same by some macro ?
No need for macros in both case it's a 0 terminated sequence of bytes
you just need to be sure it's in the UTF8 encoding to use it for xmlChar *
Daniel
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]