Re: [xml] Recovering from errors in an XML "stream"

From: Webb Scales <webb ursasecure com>
To: Liam R E Quin <liam holoweb net>, xml gnome org
Subject: Re: [xml] Recovering from errors in an XML "stream"
Date: Tue, 10 Sep 2019 00:29:40 -0400

I'm OK with making small on-the-fly "edits" to the input (such as removing the initial comment, or removing all comments), but trying to make my code discern the overall structure (such as picking out the boundaries between the documents) is starting to step over into actually parsing it, which defeats the purpose of using LibXML2.

If the TextReader didn't insist upon reading beyond the root end-tag, that would enable me to solve my problem, I think. (I don't understand why it does that.) In the absence of any other options, I'm going to experiment with the SAX interface and see if that will allow me to stop the parse at the right spot.

Anyway, thanks for your replies, Liam.

Webb

On 9/10/19 12:19 AM, Liam R E Quin wrote:

On Mon, 2019-09-09 at 22:41 -0400, Webb Scales wrote:

the 
fact remains that I don't control the text that I'm trying to parse,
and I still need to parse it, even though it's not "well-formed".

You may need to write some form of pre-processor that fixes the
problems. As you say, that may reduce the need for an XML parser.

I haven't investigated error recovery with libxml, so someone else
might have better ideas.

Liam

Webb Scales
Principal Software Architect
603-673-2306
www.ursasecure.com
webb ursasecure com

Follow-Ups:
- Re: [xml] Recovering from errors in an XML "stream"
  - From: Liam R E Quin

References:
- [xml] Recovering from errors in an XML "stream"
  - From: Webb Scales
- Re: [xml] Recovering from errors in an XML "stream"
  - From: Webb Scales
- Re: [xml] Recovering from errors in an XML "stream"
  - From: Liam R E Quin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]