Re: [libxml++] Blunt Q: what C++ bindings for XML?



andy glew amd com wrote:

Common misconception: parsers drop the original text.
An artifact of the early lexers, like lex.

Much recent work, e.g. in error handling and macro processing
and refactoring, has provided access to the original text
even in the parsed stream.

the question to ask is what content has semantic value. Are
linebreaks, whitespace, etc. to be preserved ?

These questions boil down to what level you are operating on.
Are you interested into the infosets, or do you want to know
the details of how things are encoded in a file ?
As Murray suggests, we are talking about XML here, which is at
least one level on top of raw file data. For example a DTD may
implicitly declare an attribute value for a node, even though
the attribute was not present in the 'raw' input. When looking
at the DOM, you won't see that. That's a feature, not a bug.

If you want to have access to lower levels, use lower levels,
i.e. raw I/O.

E.g. consider error handling...  it should be possible
to trace any error back to any and all files, lines, characters
which are associated with the error. In the presence of macros,
you should be able to say where the error occurred,
in every phase of macro expansion.  Yet most parsers that
drop the original text lose this ability.

well, no. The question is what error we are talking about. Is
it a syntax error that makes the file content invalid XML ?
The XML parser can clearly catch that and report it to you.
If we are talking about structural errors such as a document
that doesn't validate against a dtd/schema, that, too can be
done in a generic way, i.e. by telling the parser to match
the document against a known dtd/schema.
Everything else is up to the application.

Stefan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]