Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()

From: Daniel Veillard <veillard redhat com>
To: Rush Manbert <rush manbert com>
Cc: xml gnome org
Subject: Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()
Date: Mon, 23 Feb 2009 09:31:14 +0100

On Tue, Feb 17, 2009 at 12:29:02PM -0800, Rush Manbert wrote:

I am processing XHTML source files, rendering them to HTML strings, then 
loading the HTML string into a browser control (Webkit).

Originally I was generating the string by calling xmlDocDumpMemory(),  
but I kept reading articles that suggested you render as HTML if the  
result is being displayed by a browser. I changed to use  
htmlDocDumpMemory(), and my application still worked with no problems.

Recently, however, we were developing a new set of web pages, and I had 
occasion to load the HTML string output into a real browser (Safari), by 
first writing the HTML string to a file, then opening the file in the 
browser. To my surprise, the JavaScript error console displayed quite a 
few errors. Many of them were complaints that the HTML contained element 
pairs such as "<br></br>", or "<p></p>". Someone had asked be why we had 
extra blank lines in the browser display, and I finally realized it was 
because Safari was treating <br></br> as <br><br> (which is what the 
error message said it would do).

The source code in these cases contains <br />, <p />, etc. and I just  
verified that if I call xmlDocDumpMemory() that is what ends up in the  
output string. How can I achieve the same result using  
htmlDocDumpMemory? Or is there some other way I should be doing this?


  From an XML parser <br /> and <br></br> are strictly equivalent (well
except for the Microsoft reader API which distinguishes the two but
should not), so if your broswer is loading the file with an XML parser
then the to forms are equivalent (BTW Safari is using libxml2 for XML
parsing so maybe someone can comment about this in more details ;-)

  Now an HTML parser should make no difference between <br /> and
<br>, that's why it's suggested to serialize XHTML that way.

  The behaviour you mention sounds like a bug in my opinion, <br />
should be safe for both kind of parsing, except if internally Safari
loads as XML , reserialize as <br></br> and then hands this to the
HTML parser, I don't see any other logical way to achieve what you got.

  Also not that by serializing to a file, you loose the mime-type
information, and the browser probably has to make guesses as whether
it should process this as XML or HTML, this probably doesn't help.

  For serialization use the new xmlSave* operations you have far more
flexibility than the old APIs you're using, see
  http://xmlsoft.org/html/libxml-xmlsave.html#xmlSaveOption

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

Follow-Ups:
- Re: [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()
  - From: Julien Chaffraix

References:
- [xml] htmlDocDumpMemory() vs xmlDocDumpMemory()
  - From: Rush Manbert

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]