Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()
- From: Daniel Veillard <veillard redhat com>
- To: Rush Manbert <rush manbert com>
- Cc: xml gnome org
- Subject: Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()
- Date: Mon, 23 Feb 2009 09:31:14 +0100
On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote:
I am processing XHTML source files, rendering them to HTML strings, then
loading the HTML string into a browser control (Webkit).
Originally I was generating the string by calling xmlDocDumpMemory(),
but I kept reading articles that suggested you render as HTML if the
result is being displayed by a browser. I changed to use
htmlDocDumpMemory(), and my application still worked with no problems.
Recently, however, we were developing a new set of web pages, and I had
occasion to load the HTML string output into a real browser (Safari), by
first writing the HTML string to a file, then opening the file in the
browser. To my surprise, the JavaScript error console displayed quite a
few errors. Many of them were complaints that the HTML contained element
pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had
extra blank lines in the browser display, and I finally realized it was
because Safari was treating <br></br> as <br><br> (which is what the
error message said it would do).
The source code in these cases contains <br />, <p />, etc. and I just
verified that if I call xmlDocDumpMemory() that is what ends up in the
output string. How can I achieve the same result using
htmlDocDumpMemory? Or is there some other way I should be doing this?
From an XML parser <br /> and <br></br> are strictly equivalent (well
except for the Microsoft reader API which distinguishes the two but
should not), so if your broswer is loading the file with an XML parser
then the to forms are equivalent (BTW Safari is using libxml2 for XML
parsing so maybe someone can comment about this in more details ;-)
Now an HTML parser should make no difference between <br /> and
<br>, that's why it's suggested to serialize XHTML that way.
The behaviour you mention sounds like a bug in my opinion, <br />
should be safe for both kind of parsing, except if internally Safari
loads as XML , reserialize as <br></br> and then hands this to the
HTML parser, I don't see any other logical way to achieve what you got.
Also not that by serializing to a file, you loose the mime-type
information, and the browser probably has to make guesses as whether
it should process this as XML or HTML, this probably doesn't help.
For serialization use the new xmlSave* operations you have far more
flexibility than the old APIs you're using, see
http://xmlsoft.org/html/libxml-xmlsave.html#xmlSaveOption
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]