Re: [xml] Unexpected behavior with XML_PARSE_NOBLANKS

On Thu, Dec 10, 2009 at 1:23 AM, Daniel Veillard <veillard redhat com> wrote:
On Wed, Dec 09, 2009 at 08:36:59AM -0800, Aaron Patterson wrote:
On Wed, Dec 9, 2009 at 7:54 AM, Daniel Veillard <veillard redhat com> wrote:
On Sat, Dec 05, 2009 at 11:03:26AM -0800, Aaron Patterson wrote:
Hey everyone,

It looks like sometimes there is unexpected behavior when parsing with
XML_PARSE_NOBLANKS.  It seems that sometimes blank nodes will get
included in the resulting tree.  I don't think this is expected

 If libxml2 detected a non-blank text node at the same level it
will keep all further text nodes, assuming a mixed content element.

Understood.  Thank you!

 This tend to surprize people but since blank node elimination without
having read the DTD is a pure heuristic, the parser try to be as safe as
possible (though it's not possible to go back on nodes already parsed).
In general XML_PARSE_NOBLANKS is a deviation from the normal parsing
behaviour so in general I suggest to avoid it and just ignore the nodes
you know are purely formatting, the parser can't guess it 100%

Excellent.  Thanks for the tips.  I suspect the person using this was
trying to save memory with the tree style processor.  Can you think of
any other techniques that might help save on memory, but keep tree
style interface?  Maybe the Reader Parser?  I'm just curious.


Aaron Patterson

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]