Re: [xml] (X)HTML related proposals



On Thu, Mar 04, 2004 at 12:16:14PM +0900, Mike Hommey wrote:
Hi

I have encountered some (I think) annoying things in libxml2's (X)HTML
handling. I would propose patches, but I would like to first discuss
about the issues, to be sure not to go in bad directions.

So here we are :
- the htmlSetMetaEncoding() function just removes the existing
corresponding meta tag, if there was any, to add its own. Problem is
that text/html is not the only allowed content-type for (x)html files.
It could be application/xhtml+xml, for example. My proposal would be to
only touch the charset part of the meta tag if it already exists, and
not the mime type by itself, or provide some work around to set the
default mime type.
On the other hand, one may also want NOT to have the meta tag, if, for
example, he provides this information on the HTTP (or some other
protocol) level...

- By the way, the meta tag is not added if there is no head element.

- Output of XHTML follows the compatibility guidelines from the spec,
which is fine, but that should be disablable, I think.

- XHTML 1.1 is treated as pure XML, so that it doesn't get the special
treatment (meta tag addition, <script/> cdata-ing, etc.). Well, okay,
most of the special treatment is not needed for XHTML 1.1 (like adding
the lang attribute), but one may expect at least the meta tag to be
added, like in other XHTML data.

- In the crazy case of the presence of non namespaced html elements
inside a document, a namespace will be applied on it. Okay, this is a
VERY crazy case [1], but well...

  Hi Mike,

I didn't replied to your initial message, because I didn't had a good
way to suggest changing things. Now that 2.6.8 is out, there is room
for implementing custom serialization rules without using yet another
set of global variables. Check the include/libxml/xmlsave.h header and
you will get an idea of where I'm heading.
Basically, the new APIs introduce a xmlSaveCtxt structure, and the underlying
code base for XML serialization uses it now. There is the possibility
to provide saving option when building such a context, and what you suggested
could be such options. I also want to merge all the XHTML and HTML code to
also reuse the saving context and note the change as saving options.
There is a reasonable amount of work needed to make the switch but it
can be made in an API and ABI compatible way, while:
  - reducing the amount of code
  - making serialization more flexible
Some significant optimization can also be done I'm sure, some are trivial
like indentation handling, others may be more subtle but the availability
of a context to provide precomputations will help I'm sure,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]