Re: [xml] Why do the HTML SAX Parsers build trees?



Larry Stamper wrote:

Since  one of the main motivations for using SAX-style parsing is to
avoid the expense of building a tree representation of the markup
language, I am wondering why the libxml HTML parser builds a tree
anyway. It seems to at least partially defeat the purpose of having a
SAX parser.

A SAX parser cannot be a pure stream parser, with no memory of earlier
parts of the stream. To check the integrity of the XML it needs to store
the starts tags for the path from the root node to the current node.
Then it can check they match with the end tags. This is the only sense
in which a SAX parser builds a tree. It only holds a fragment of the
tree at any time, and only holds minimal information about the nodes. If
you had a super deep heirarchy of nodes you might see some impact on
storage requirements, but normally the storage required is lost in the
noise.

Regards,
Steve




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]