Re: [xml] Handling minor errors in XML and continuing

On Sun, 2004-01-04 at 19:38, stephen wrote:
Some time ago I wrote a tool to parse XML using libxml's SAX interface. 
Unfortunately I now have to deal with some badly behaved clients which
send unquoted angle brackets etc. which cause the XML parsing to choke.

My question is, are there any options in any of libxml's parsers to
gracefully handle minor errors of non-conformance and continue (like a
browser) rather than just give up?

I appreciate that in an ideal world the buggy clients would be fixed,
but that's just not an option for me.  What can I do?

You are pretty much doomed. Daniel's approach (and I fully support this)
has always been that libxml only handles well-formed input. If it is not
well-formed, it is not XML and so you want libnotxml or something
similar to handle your data.

You can work in "recovery" mode (see, for example, the implementation of
xmllint --recover), but that only outputs as much as it can up to the
first well-formedness problem; it has no real way to recover from

Finally, the approach that a few os us have tried at various times is
parsing until you see an error like this and then trying to work out
what the problem was, fixing it in the input stream and re-parsing. It
is extremely fiddly, though, and not 100% reliable (I have ever come up
with anything that was able to work around more than a few simple types
of errors). Things have been made a bit easier by the new error handling
calls in the 2.6.x releases of libxml.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]