Hi,Thanks for the assistance to date, it is much appreciated. One of the joys of having the source is that I can look and see exactly how the routine works. As it stands now, isolat1ToUTF8 will never return an error. (there's only one return statement, and it's a "0").
So, I figure if I check the number of bytes processed in the input stream against the length of the input stream would work fine as a verification that the entire input string was processed.
Here's a sample of code I threw togeter.... does anything glaring wrong jump out at anyone??
#include <tree.h>
/* 80 character input buffer */
#define ISOLAT1BUFF 80
/* isolat1ToUTF8 shouldn't more than double (worst case) */
#define UTF8DEFAULT 2*ISOLAT1BUFF
main()
{
int tmp;
char * isolat1Tag="test conversion";
int isolat1TagLength;
xmlChar *utf8Tag = xmlMalloc( UTF8DEFAULT * sizeof(utf8Tag));
int utf8TagLength = UTF8DEFAULT;
/* Let's take the input tag and convert from an assumed format of
* ISO-8559-1 (iso-latin-1) to the internal representation UTF-8
*/
tmp = isolat1TagLength = strlen(isolat1Tag);
if (!isolat1ToUTF8(utf8Tag, &utf8TagLength, isolat1Tag,
&isolat1TagLength)){
/* It worked, let's make sure that it processed the whole input
string */
if (isolat1TagLength != tmp){
fprintf(stderr, "isolat1ToUTF8 could not process entire TAG\n");
exit(1);
}
} else {
fprintf(stderr, "isolat1ToUTF8 returned an error!!\n");
exit(1);
}
/* null terminate the UTF8 string */
*(utf8Tag+utf8TagLength)='\0';
exit(0);
}
Daniel Veillard wrote:
On Thu, Aug 01, 2002 at 08:53:47PM +0200, Hannu Krosing wrote:On Thu, 2002-08-01 at 10:49, Daniel Veillard wrote:French characters (well accented ones for example) are not in ASCII but probably in ISO Latin 1, use the following function to convert them before passing the strings to the API: int isolat1ToUTF8(unsigned char* out, int *outlen, const unsigned char* in, int *inlen) {Why is it "unsigned char* out" and not "xmlChar* out" ?for conformance with iconv() interfacesOr are they actually made the same by some macro ?No need for macros in both case it's a 0 terminated sequence of bytes you just need to be sure it's in the UTF8 encoding to use it for xmlChar * Daniel