libxml - utf8 / 8bit charsets.



Hi Daniel,

        When I looked into this issue, it seemed to me that libxml was   
being too clever for it's own good :-) but first - let me assume that the
only significant user of libxml1 is now the GNOME project - is that fair ?   

        So - there are a lot of possible char-sets that we could support,   
however looking at parser.c (xmlSwitchEncoding), it seems that we flag
errors on all encodings except ENCODING_NONE and ENCODING_UTF8.

        So - given that mixed charset xml files exist, why can we not get
libxml to simply return an exact representation of what was in the input
string - regardless of encoding. And similarly on write, we just assume
the application is going to get it correct.

        I think what screwed me up using 8 bit, was code that started
examining a byte stream as chars assuming that it was utf-8 and trying to
do validation of it to ensure that no chars in a certain range were
present. If this is the breakage, it is of very limited use to us.
  
        Of course, it's entirely possible that I just mis-remembered
everything.
  
        Regards,
  
                Michael.

-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]