[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Possible to get XHTML output from HTMLparser?
- From: Daniel Veillard <veillard redhat com>
- To: "Martin (gzlist)" <gzlist googlemail com>
- Cc: xml gnome org
- Subject: Re: [xml] Possible to get XHTML output from HTMLparser?
- Date: Sun, 4 Jan 2009 15:39:44 +0100
On Sat, Dec 20, 2008 at 12:25:57PM +0000, Martin (gzlist) wrote:
> On 19/12/2008, R. Steven Rainwater <srainwater ncc com> wrote:
> > I'm using libxml2 for an application that generates XHTML output. I've
> > recently needed to parse some nasty HTML tag soup input and incorporate
> > it into some of my pages. Libxml2's HTMLparser does a great job of
> > fixing up the bad HTML but it outputs HTML v4 markup. Is there any
> > existing function that will output XHTML markup from the HTMLparser?
> >
> > ... I'm assuming I'd just need to walk the HTMLparser output
> > tree, closing empty elements, expanding stand-alone attributes, and
> > such. Looks like HTMLparser already fixes some things like making sure
> > attribute values are quoted.
>
> Those are serialisation details that the tree doesn't care about.
>
> In libxml2 htmlDoc objects *are* xmlDoc objects, so if you just care
> about well-formedness any of the normal XML functions will do. Will
> need to walk the tree to set the correct the namespace on all the
> nodes however.
>
> If you also care about validity according to a particular XHTML DTD,
> you'd have to do considerable tree modifications to turn arbitrary tag
> soup into something correct. Browsers have complex heuristics to, for
> instance, make sanity out of form elements inside tables.
As martin said. If you want the extra XHTML1 serailization rules
to be applied by the libxml2 serializer add the XHTML1 DTDs to the
document after parsing and before calling the saving function.
See http://www.w3.org/TR/xhtml1/#normative 3.1.1 4/
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]