Re: [xml] Apparently incorrect paragraph wrapping in HTML parser



  Yes, thanks ! That sounds the right approach to me, I would just turn
merge that with a new htmlParserOption HTML_PARSE_STRICT, which could be
either passed by the user to maintain the current behaviour or activated by
default when the DOCTYPE is read if it happen to be a Strict HTML one.

Yes, checking the DTD is indeed an option; though I'm not sure how it would handle case in which I link a DTD myself?
Eg.:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://very.silly/html401-like/but/not/exactly/strict.dtd";>


Anyway, I do not see any reason why parser should mess with the document in first place; it's supposed to parse it, not alter it deliberately according to what it thinks that may be the right solution. Could someone please explain me why to alter the document?

And please, do not say "to be compliant with standards", because standards to my best knowledge do not require the parser to "fix" the document (though I may be wrong, I doubt standards would require such a thing) by adding tags in case it's not considered correct.

 -- iSteve

PS.: The <p> tag injection is not correct anyway. "<img>" tag is inline, yet, not wrapped into <p>. Still want to keep it?

For details, see: http://www.w3.org/TR/REC-html40/sgml/dtd.html#inline
'<!ENTITY % special "A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB | SUP | SPAN | BDO"> <!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;"> <!ENTITY % block "P | %heading; | %list; | %preformatted; | DL | DIV | NOSCRIPT | BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS">'



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]