Re: [xml] A possible problem with libxml2

What I have done is apply the
attached changes to encoding.c. Probably similar changes should be
applied to xmlCharEncFirstLine, but so far I have not (in my app. I make
the first call with only 4 bytes, so I don't hit any problems with
xmlCharEncFirstLine). Several things come to mind, that might be

- It might well be that this processing should only be applied to, say,
GB2312 and Big5 conversions where these quirky character set problems
are common

I think you have done a great job in tracing / locating where your problems
occur, but I'm not comfortable with your proposed solution.  Part of the aim
of libxml is to remain "generic" enough to be used by most, and to avoid
"locale-specific" behaviour whenever possible.  The potential problem with
your proposed solution is that other character-set encodings (or other
applications) may not want/deserve the same treatment.

Basically, when libxml encounters a "generic" character set such as what you
are working with, if there is no other specific user instruction libxml
turns the data over to iconv to handle.  Your problem arises because iconv
doesn't like your data, and you want it to be handled in a different manner.
So, I would suggest a better solution would be to implement your own input
handler.  Within that handler (which should be pretty simple to write), you
can use iconv to take care of everything which doesn't make it "choke", then
(still within that handler) gently perform a Heimlich maneuver to remove any
remaining obstructions.

Bill Brack
ABC QuickSilver
Hong Kong

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]