Re: [xslt] Structural difference between html and xml output?

Daniel Veillard writes:
> On Wed, Nov 14, 2001 at 10:04:36AM +0100, Morus Walter wrote:
> > Hi Daniel,
> > 
> > any comment on that problem?
>    Hum, each time I look at the HTMLparser/serializer I end-up with
> a big headache :-\
> > The effect seems to come from either the saveEndTag-Column in the 
> > html40ElementTable (HTMLparser.c line 358 f), where li and p get a 1 
> > instead of 0 as other non-empty element or it's usage in HTMLtree.c (lines
> > 541 and 1043) where end tags are omitted, if saveEndTag != 0 for this
> > element.
> > So it might be easily fixed by either changing the entries in html40ElementTable
> > to 0 or changing the test to `saveEndTag == 2'...
>   It seems the right solution is to get <li> and <p> item saveEndTag
> to 1. I'm just wondering how much other stuff this is gonna break :-\
> Since in the HTML area there is no rules (or nobody respects them),
> I don't see any other methodology than trial and error.

Hmm. I'm suprized that explicit end tags might be a problem.
I mean, yes, you can omit them, and this is sometimes done in a way, where
the meaning as defined by sgml rules and the meaning intended by the author
differ, but here it is clear from the internal tree structure where a given 
element ends and explicit end tags just make this structure clear in the 
serialised document.
This may sometimes differ from what the user wanted, but then the internal tree
structure has been wrong in the first place and the problem does not arise
from the serialiser. The most probable reason for this might be parsing badly
shaped html, parsed by the html parser, that will have to make certain 
decisions if end tags are missing and serialising it again. However then it's
a problem of the initial html (or the parser).
In my case the input was xml and a stylesheet, where element starts and ends
are all explicit, and the problem I see, is that the end tag omission by the
HTML serializer leads to unwanted results.

>   I assume your software processes quite a bit of HTML, could you
> try to set up saveEndTag to 1 for <li> and <p> and report if this break
> somewhere else. Others using HTML are invited to do so too.
It's the other way around: there is a 1 and I suggest 0 to enforce p and li
end tag generation. Sorry if I wasn't clear about that.

As far as I can see from the source, this does not change anything else, except
creating these end tags. For me that's ok. I cannot speak for others though.

tnx & greetings
Th. Morus WALTER · Manager Content & Data Development GmbH & Co. KG
Schellingstraße 35 · 80799 München · ·

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]