Re: [xml] HTMLparser enhancements
- From: Daniel Veillard <veillard redhat com>
- To: Nick Kew <nick webthing com>
- Cc: xml gnome org
- Subject: Re: [xml] HTMLparser enhancements
- Date: Tue, 14 Jan 2003 07:31:24 -0500
Hi Nick,
On Tue, Jan 14, 2003 at 12:48:57AM +0000, Nick Kew wrote:
I've been using libxml2/libxslt for some time, though I'm new to
this list.
Some of my applications use the HTMLparser, and need to know more
about HTML than is provided in the htmlElemDesc. Specifically,
I have added lookup tables of HTML structure:
* what attributes are allowed in an element
* what subelements can be contained in an element
* default repair-attempt information when elements are misplaced -
for example, if loose text is found in <ul>, it will insert <li>,
<table> -> <tr>, <body> -> <div>, etc.
I've done this using my own lookup table (hash), but I could easily
integrate this into libxml2, by extending the htmlElemDesc structure
and the associated declaration in HTMLparser.c. This could then be
accompanied by a set of simple accessor functions too.
Are you interested in this as a patch?
Thanks for the offer. We discussed last year the possibility of providing
a tidying support for the HTML parser. It wasn't clear if this was a good
idea to provide it directly in the library. Maybe what you suggest (assuming
I understood correctly) of providing the data and the APIs to actually
make the validity checks without enforcing them at the parser level is
the right approach to this. Another factor is the size, would that inflate the
HTMLparser size a lot ? The last point is binary and API compatibility...
Basically, yes this sounds good, but I can't promise to make an integration
without a good idea of the expected code. Can you send a sample, description
or the patch if you already have it handy ?
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]