[xml] Follow-up: Tips for correct encoding handling when building documents from scratch?



Hello,

-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Tuesday, January 15, 2002 3:44 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Encoding problems building document from scratch

<violently_cut/>

  it can be done with existing interfaces.

<violently_cut/>

Well, i was sort of glad about that statement from Daniels answer
to a former posting of mine, regarding my probs to correctly handle
the character encoding in newly created docs.
But it seems that i've to confess that i'm stranded while trying
to spy out how to do it (as nifty as possible... 8)
I dare to depict again what i'm trying to do:

I want to build a new document (using libxml 2.4.12)

  docPtr = xmlNewDoc(XML_DEFAULT_VERSION);

and set the character encoding (that is either "iso-8859-1" or "roman-8")

  docPtr->encoding = xmlStrdup("iso-8859-1") [("roman-8")]

It's clear that "roman-8" is not supported by libxml by default,
i've to register a set of encoding handler to extend libxml's
encoding facilities.
Now i want to populate that doc with new nodes (content and attributes)
that are defined by application data that comes with the corresponding
encoding.
Additionally i want to use xmlEncodeEntitiesReentrant(); to encode
special characters and possible entities.
If i got it right, xmlEncodeEntitiesReentrant() consumes UTF-8 encoded
character buffer (as any libxml internals), allthough it seems to work
also for iso-8859-1, but i'm not shure about that.

Now we come to the root of the matter: Is it mandatory to do the
character conversion to UTF-8 on application level, something along

  nodePtr = xmlNewNode(NULL, nodeNameBuff);
  if (docPtr->encoding == "iso-8859-1") {
    isolat1ToUTF8(outBuff, outlen, inBuff, inlen);
  }
  else if (docPtr->encoding == "roman-8") {
    myRoman8ToUTF8(outBuff, outlen, inBuff, inlen);
  }
  else ...
  encBuff = xmlEncodeEntitiesReentrant(docPtr, outBuff);
  xmlNodeSetContent(nodePtr, encBuff);
  ...
  xmlSetProp(nodePtr, attrNameBuff, anotherEncBuff);
  ...

?

Or is it possible to use the (default/extended) libxml encoding support,
best while constructing the new nodes?


Any help/hints/tips to that subject would be welcome!
  
Thanx in advance & Ciao, Markus



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]