Re: [xml] Build DOM tree manually from ASCII file and getting data into xmlChar



Hi,

Thanks for the assistance to date, it is much appreciated. One of the joys of having the source is that I can look and see exactly how the routine works. As it stands now, isolat1ToUTF8 will never return an error. (there's only one return statement, and it's a "0").

So, I figure if I check the number of bytes processed in the input stream against the length of the input stream would work fine as a verification that the entire input string was processed.

Here's a sample of code I threw togeter.... does anything glaring wrong jump out at anyone??

#include <tree.h>

/* 80 character input buffer */
#define ISOLAT1BUFF 80

/* isolat1ToUTF8 shouldn't more than double (worst case) */
#define UTF8DEFAULT 2*ISOLAT1BUFF

main()
{
 int tmp;

 char * isolat1Tag="test conversion";
 int isolat1TagLength;

 xmlChar *utf8Tag = xmlMalloc( UTF8DEFAULT * sizeof(utf8Tag));
 int utf8TagLength = UTF8DEFAULT;

 /* Let's take the input tag and convert from an assumed format of
  * ISO-8559-1 (iso-latin-1) to the internal representation UTF-8
  */
 tmp = isolat1TagLength = strlen(isolat1Tag);

if (!isolat1ToUTF8(utf8Tag, &utf8TagLength, isolat1Tag, &isolat1TagLength)){ /* It worked, let's make sure that it processed the whole input string */
   if (isolat1TagLength != tmp){
     fprintf(stderr, "isolat1ToUTF8 could not process entire TAG\n");
     exit(1);
   }
 } else {
   fprintf(stderr, "isolat1ToUTF8 returned an error!!\n");
   exit(1);
 }

 /* null terminate the UTF8 string */
 *(utf8Tag+utf8TagLength)='\0';

 exit(0);
}



Daniel Veillard wrote:

On Thu, Aug 01, 2002 at 08:53:47PM +0200, Hannu Krosing wrote:
On Thu, 2002-08-01 at 10:49, Daniel Veillard wrote:
 French characters (well accented ones for example) are not in ASCII
but probably in ISO Latin 1, use the following function to convert them
before passing the strings to the API:
 int
 isolat1ToUTF8(unsigned char* out, int *outlen,
               const unsigned char* in, int *inlen) {
Why is it "unsigned char* out" and not "xmlChar* out" ?

 for conformance with iconv() interfaces

Or are they actually made the same by some macro ?

 No need for macros in both case it's a 0 terminated sequence of bytes
you just need to be sure it's in the UTF8 encoding to use it for xmlChar *

Daniel





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]