RE: [xml] Possible bug with Byte Order Marks



Returning to my original query, which was that xmlDocDumpMemory and
xmlDocFormatDump don't work correctly for "UTF-16", and having looked
more
closely at the code for those functions, I think that my proposed
changes
have too broad a scope. I can see a different solution that can easily
be
applied to those two functions but I am confused by what seems to me to
be
an inconsistency, as follows:

A call to xmlFindCharEncodingHandler for "UTF-16" fails.
A call to xmlParseCharEncoding for "UTF-16" followed by a call to
xmlGetCharEncodingHandler returns the handler for
XML_CHAR_ENCODING_UTF16LE.

  the problem is that you add some state information, if you can keep this
in
the local variables of the serialization routine then that's fine.

The two Dump functions call xmlParseCharEncoding followed by
xmlFindCharEncodingHandler. I propose putting a call to
xmlGetCharEncodingHandler (using the result from the call to
xmlParseCharEncoding), and only calling Find if the Get fails. This is
hopefully a safe change.

  Hum, sounds better, could you give a patch ?


This is the patch:

In xmlDocFormatDump in tree.c:
Was:
        if (enc != XML_CHAR_ENCODING_UTF8) {
          handler = xmlFindCharEncodingHandler(encoding);
            if (handler == NULL) {

Now:
        if (enc != XML_CHAR_ENCODING_UTF8) {
            handler = xmlGetCharEncodingHandler(enc);
            if (handler == NULL)
                handler = xmlFindCharEncodingHandler(encoding);
            if (handler == NULL) {

In xmlDocDumpFormatMemoryEnc in tree.c:
Was:
        } else if (doc_charset != XML_CHAR_ENCODING_UTF8) {
            conv_hdlr = xmlFindCharEncodingHandler(txt_encoding);
            if ( conv_hdlr == NULL ) {

Now:
        } else if (doc_charset != XML_CHAR_ENCODING_UTF8) {
            conv_hdlr = xmlGetCharEncodingHandler(doc_charset);
            if (conv_hdlr == NULL)
                conv_hdlr = xmlFindCharEncodingHandler(txt_encoding);
            if ( conv_hdlr == NULL ) {


I'm still not 100% convinced that this is the correct way to go, but at
least it hopefully won't break anything. Another alternative would be, in
xmlInitCharEncodingHandlers, to register a handler for UTF-16 to be the same
as the handler for UTF-16LE or to create an alias of UTF-16 for UTF-16LE,
but I don't have the experience of libxml to know about any implications
this might have.

The information in this message is intended solely for the addressee and should be considered confidential.  
This message has been scanned for viruses using the most current and reliable technology available.  VISTA 
excludes all liability related to any viruses that might exist in any attachment or which may have been 
acquired in transit.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]