Re: [xml] xmllint --html --xmlout

On Mon, Feb 12, 2007 at 08:38:55AM -0500, Daniel Veillard wrote:
On Mon, Feb 12, 2007 at 07:42:13AM -0500, Elliotte Harold wrote:
I'm working on a book about converting messy old HTML to clean XHTML, 
and I'm trying to decide exactly how much of each tool to recommend when.

  libxml2 HTML parser has been used for many real world tools, like HTML
indexers, it will consume mostly anything, but it doesn't try to add much
correcting recipes on top of it. This was discussed on the list a couple
of years ago, and that's where libxml2 HTML parsing error handling principle
were set up.

  BTW, now that I think about it, I have done that for years and years but 
slightly differently. The majority of content is kept in HTML
files edited with whatever preferred tool available, and then the web site
is generated as XHTML1 content using xsltproc --html option, allowing to parse
the HTML input and feed it to a stylesheet which then split, format, add
presentation, generates indexes and dumps as XHTML1. Next rule in the
Makefile uses xmllint --valid --noout the resulting .html files to check for
well-formedness and validation against the XHTML1 DTDs, this is all
in the doc subdir starting from xml.html initial file.


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]