Re: [xml] Minor patch



On Wed, May 01, 2002 at 02:28:18PM +0100, Anthony Jones wrote:
This only fixes a very minor problem, but I'd thought I'd point it out
anyway:

The problem is that the hash table in hash.c grows when the length of a
chain in the table is over MAX_HASH_LEN. But growing the table does *not*
necessarily mean an individual chain in the table will get smaller, because
the elements in that chain may still have the same hash value.

For example, if we parse the following (very contrived ;) ) document, the
hash table will grow to a size of 16384, even though it only has 12
elements in it, and it will continue growing exponentially if we add any
more elements to the chain that is already too long.

<?xml version="1.0"?>
<!DOCTYPE root [
    <!ENTITY abcde "abcde">
    <!ENTITY abced "abced">
    <!ENTITY abdec "abdec">
    <!ENTITY abdce "abdce">
    <!ENTITY abedc "abedc">
    <!ENTITY abecd "abecd">
    <!ENTITY acbde "acbde">
    <!ENTITY acbed "acbed">
    <!ENTITY adbec "adbec">
    <!ENTITY adbce "adbce">
    <!ENTITY aebdc "aebdc">
    <!ENTITY aebcd "aebcd">
]>
<root/>

The attached patch should provide more sensible behaviour in this
pathological case -- we only grow the hash table if the *average* chain
length is greater then MAX_HASH_LEN (perhaps MAX_HASH_LEN should also be
reduced if this patch is applied?)

  
  Hum, While I understand the specific case, I really wonder what is the
associated penalty in term of speed for the whole class of applications.
For example timing impact on DocBook XSLT transformations, or the
XSLTMark result before and after.
  I'm not tempted to apply this patch until I have some report on this.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]