Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar

From: Steve Williams <swilliams rinax com>
To: xml gnome org
Subject: Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
Date: Thu, 01 Aug 2002 16:34:41 -0600

Hi,

Thanks for the assistance to date, it is much appreciated. One of thejoys of having the source is that I can look and see exactly how theroutine works.As it stands now, isolat1ToUTF8 will never return an error. (there'sonly one return statement, and it's a "0").

So, I figure if I check the number of bytes processed in the inputstream against the length of the input stream would work fine as averification that the entire input string was processed.

Here's a sample of code I threw togeter.... does anything glaring wrongjump out at anyone??

#include <tree.h>

/* 80 character input buffer */
#define ISOLAT1BUFF 80

/* isolat1ToUTF8 shouldn't more than double (worst case) */
#define UTF8DEFAULT 2*ISOLAT1BUFF

main()
{
 int tmp;

 char * isolat1Tag="test conversion";
 int isolat1TagLength;

 xmlChar *utf8Tag = xmlMalloc( UTF8DEFAULT * sizeof(utf8Tag));
 int utf8TagLength = UTF8DEFAULT;

 /* Let's take the input tag and convert from an assumed format of
  * ISO-8559-1 (iso-latin-1) to the internal representation UTF-8
  */
 tmp = isolat1TagLength = strlen(isolat1Tag);

if (!isolat1ToUTF8(utf8Tag, &utf8TagLength, isolat1Tag,&isolat1TagLength)){/* It worked, let's make sure that it processed the whole inputstring */

   if (isolat1TagLength != tmp){
     fprintf(stderr, "isolat1ToUTF8 could not process entire TAG\n");
     exit(1);
   }
 } else {
   fprintf(stderr, "isolat1ToUTF8 returned an error!!\n");
   exit(1);
 }

 /* null terminate the UTF8 string */
 *(utf8Tag+utf8TagLength)='\0';

 exit(0);
}



Daniel Veillard wrote:

On Thu, Aug 01, 2002 at 08:53:47PM +0200, Hannu Krosing wrote:

On Thu, 2002-08-01 at 10:49, Daniel Veillard wrote:

 French characters (well accented ones for example) are not in ASCII
but probably in ISO Latin 1, use the following function to convert them
before passing the strings to the API:
 int
 isolat1ToUTF8(unsigned char* out, int *outlen,
               const unsigned char* in, int *inlen) {

Why is it "unsigned char* out" and not "xmlChar* out" ?

 for conformance with iconv() interfaces

Or are they actually made the same by some macro ?

 No need for macros in both case it's a 0 terminated sequence of bytes
you just need to be sure it's in the UTF8 encoding to use it for xmlChar *

Daniel

Follow-Ups:
- Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
  - From: Daniel Veillard

References:
- Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
  - From: Daniel Veillard
- Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
  - From: Hannu Krosing
- Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]