Re: [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"

From: Daniel Veillard <veillard redhat com>
To: GPN <gpn libxml gmail com>
Cc: xml gnome org
Subject: Re: [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"
Date: Wed, 28 Sep 2005 11:06:33 -0400

On Wed, Sep 28, 2005 at 07:25:52PM +0530, GPN wrote:

Hello,
I am working off release 2.6.22, and I am proposing the
following changes to the code.
- A new function xmlStrcaseEqual() might be required in
  xmlstring.c, which can check if the current character
  being parsed is between 'A' and 'Z', and if so compares
  using casemap array as is done in xmlStrcasecmp().


  not needed, just !xmlStrcasecmp() the API is too large already

- In htmlParseName(), the condition which checks if the
  current character is upper-case, and which transforms
  it needs to be removed. Name can be stored as it is.


  no. That would have to be conditionalized depending on a special
parsing flag option. There is also  a number of tables indexed by
the lowercase name and that will need to be preserved

- In other parts of the code (only in HTMLparser.c), the
  comparsions using xmlStrEqual() for names, need to be
  replaced by xmlStrcaseEqual().


  I.e. makes a lot of costly calls instead of one costly and a number
of cheap ones, I disagree with this approach.

I am assuming that pretty much all HTML related functionality
is contained within HTMLparser.c, and the core xml functions
need not change to accomodate this enhancement.


  This enhancement will have to be conditional to a parsing option.
Using the HTML parser to parse XML is basically wrong and I don't
want that to be a default behaviour of the HTML parser.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"
  - From: GPN

References:
- [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"
  - From: GPN

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]