Re: [xml] --loaddtd bug or feature?

On Fri, Feb 08, 2008 at 05:26:15PM +0100, Florent Guiliani wrote:
Hi all,

I'm wondering why xmllint --loaddtd (aka XML_PARSE_DTDLOAD) combined with 
--format option produce a different result in this 2 test case:

  --format is heuristic, do not expect a 100% correct behaviour, because
correctness is not definable (otherwise the wise people who defined XML
after nearly 20 years of SGML would have explained when a space is indentation
or not, peoblem it's just not possible).

Note the space char inserted bitween </p> and </body>. Why this single space 
char has broke the reindent process ?

  Because libxml2 serializer is being careful while not being
exhaustive. It saw a text node as the child of body so refused to
add more text node under body, because those might be significant.
The code is in xmlsave.c around line 1264

Do you think that test case 2 is getting the espected result or do you think 
that test case 2 is revealing a bug ?

The indent process isn't broke if you remove --loaddtd option.

  If you don't load the DTD, then --format at parsing time tries to drop
what seems to be formatting blank spaces. Without a DTD it assume blank
text nodes (if there hasn't been a non blank text node sibling before)
are just formating and the content of body isn't mixed-content. If you
load the DTD, it knows body is nixed content (text and element as defined
in DTd) so keep the 'single space char' text node.

  Again --format is heuristics, once can't guess what the language designer
and the document writers had in their mind, only associated logic at the
application level can distinguish absolutely between pure formatting and
document content. Libxml2 play safe and avoid changing the document if it
heuristic raise any doubt about removing/changing text node being unsafe.
  You ave the API to override any heuristic and modify the documents if
you know the logic,


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]