Re: [xml] Newbie question: how strict should usage of xmlChar be?



William M. Brack wrote:

First, take a look at http://xmlsoft.org/encoding.html where this
should be explained (that's a polite way of saying RTFM :-).  The

Now I feel bad... I had read most of the "Developer Menu", but neglected that one because I wasn't concerned about it. Now I see that was exactly what I should have read :-)

basic idea is that, internally within the library, all the "char"
data is kept as UTF8, and xmlChar* is a pointer to a UTF8 string. When an input document is parsed, the library makes use of the
encoding declaration (or some guesswork) to do any conversion
required from the document's encoding into UTF8, and the reverse is
true on output.  However, if the user uses the library API's to put
data into the tree (e.g. creates a node with a "name" attribute) it
is up to the user to assure any necessary encoding is performed
before the API is called.  Similarly, if he uses the API's to fetch
data, he must be prepared to process UTF8 (if the input document
might contain characters which needed to be encoded).

OK, so since my app uses only tag/attribute names that consist entirely of the ASCII-compatible portion of UTF-8, then I do not need to xmlCharStrdup my hardcoded strings within the app before sending them into the library API calls. The converse is also true, when I ask the API what the name of an element is, I only need to check for the ones I'm expecting; if a user supplies me an XML document with a non-ASCII (but valid UTF-8) tag in it, it will be treated no differently than an ASCII-compatible UTF-8 tag that I don't understand (meaning it would have failed DTD validation had they used the DTD).

Thanks for the help, I can now proceed with cleaning up some code with a better understanding of what it's supposed to doing.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]