Re: [xml] Possible to get XHTML output from HTMLparser?



On Sun, 2009-01-04 at 15:39 +0100, Daniel Veillard wrote:
On Sat, Dec 20, 2008 at 12:25:57PM +0000, Martin (gzlist) wrote:
On 19/12/2008, R. Steven Rainwater <srainwater ncc com> wrote:
I'm using libxml2 for an application that generates XHTML output. I've
 recently needed to parse some nasty HTML tag soup input and incorporate
 it into some of my pages. Libxml2's HTMLparser does a great job of
 fixing up the bad HTML but it outputs HTML v4 markup. Is there any
 existing function that will output XHTML markup from the HTMLparser?

In libxml2 htmlDoc objects *are* xmlDoc objects, so if you just care
about well-formedness  any of the normal XML functions will do. Will
need to walk the tree to set the correct the namespace on all the
nodes however.

  As martin said. If you want the extra XHTML1 serailization rules
to be applied by the libxml2 serializer add the XHTML1 DTDs to the
document after parsing and before calling the saving function.
See http://www.w3.org/TR/xhtml1/#normative 3.1.1 4/

Ok, thanks, that makes sense. 

Do I need to write the code to walk the tree and apply the new name
space to each node or is there an existing function I'm missing along
the lines xmlSetTreeDoc() that will do it for me? If not, would you like
an xmlSetTreeNs() and xmlSetListNs() function or is this too obscure to
be generally useful?

-Steve





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]