Re: [xml] Newbie question: how strict should usage of xmlChar be?



Kevin P. Fleming said:
I've got an app that uses libxml2; I didn't write it, but I'm
maintaining it. I'm wondering, since the app contains a variety of
direct casts to and from xmlChar, how strict should this be?

I see that the library contains xmlCharStrdup for converting a char
* to
xmlChar *, but I also see from the header files that xmlChar is just
an
"unsigned char" anyway, so converting from char * to xmlChar * is a
safe
cast. Are there environments where this is not the case, or is it
planned that libxml2 may use something other than "unsigned char"
for
xmlChar in the future?

I guess my concern is if doing these direct casts will impact
libxml2's
abiliity to handle XML streams that have been encoded using
something
other than ISO-8859-1. I have a lot of places where I need to supply
element/attribute names to libxml2 calls, and if I have to
xmlCharStrdup/xmlFree each one of them I can do that, but I want to
make
sure I'm not wasting my time and energy :-)

First, take a look at http://xmlsoft.org/encoding.html where this
should be explained (that's a polite way of saying RTFM :-).  The
basic idea is that, internally within the library, all the "char"
data is kept as UTF8, and xmlChar* is a pointer to a UTF8 string. 
When an input document is parsed, the library makes use of the
encoding declaration (or some guesswork) to do any conversion
required from the document's encoding into UTF8, and the reverse is
true on output.  However, if the user uses the library API's to put
data into the tree (e.g. creates a node with a "name" attribute) it
is up to the user to assure any necessary encoding is performed
before the API is called.  Similarly, if he uses the API's to fetch
data, he must be prepared to process UTF8 (if the input document
might contain characters which needed to be encoded).

HTH

Bill



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]