Re: [xml] Apparently incorrect paragraph wrapping in HTML parser
- From: iSteve <isteve deadcd org>
- To: xml gnome org
- Subject: Re: [xml] Apparently incorrect paragraph wrapping in HTML parser
- Date: Thu, 12 Jan 2006 14:35:10 +0100
Yes, thanks ! That sounds the right approach to me, I would just turn
merge that with a new htmlParserOption HTML_PARSE_STRICT, which could be
either passed by the user to maintain the current behaviour or activated by
default when the DOCTYPE is read if it happen to be a Strict HTML one.
Yes, checking the DTD is indeed an option; though I'm not sure how it
would handle case in which I link a DTD myself?
Eg.:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://very.silly/html401-like/but/not/exactly/strict.dtd">
Anyway, I do not see any reason why parser should mess with the document
in first place; it's supposed to parse it, not alter it deliberately
according to what it thinks that may be the right solution. Could
someone please explain me why to alter the document?
And please, do not say "to be compliant with standards", because
standards to my best knowledge do not require the parser to "fix" the
document (though I may be wrong, I doubt standards would require such a
thing) by adding tags in case it's not considered correct.
-- iSteve
PS.: The <p> tag injection is not correct anyway. "<img>" tag is inline,
yet, not wrapped into <p>. Still want to keep it?
For details, see: http://www.w3.org/TR/REC-html40/sgml/dtd.html#inline
'<!ENTITY % special "A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB |
SUP | SPAN | BDO">
<!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; |
%formctrl;">
<!ENTITY % block "P | %heading; | %list; | %preformatted; | DL | DIV |
NOSCRIPT | BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS">'
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]