Re: [xml] Better hash function for dict.c
- From: Daniel Veillard <veillard redhat com>
- To: Stefan Behnel <stefan_ml behnel de>
- Cc: xml gnome org
- Subject: Re: [xml] Better hash function for dict.c
- Date: Wed, 6 Aug 2008 04:11:13 -0400
On Wed, Aug 06, 2008 at 08:25:14AM +0200, Stefan Behnel wrote:
Hmm, was that in my patch? Out of the top of my head, shouldn't the last line read
xmlDictComputeBigKey(prefix, -1, xmlDictComputeBigKey(name, len, len))))
or something in that line? This looks like a copy&paste error to me...
err, yes, actually we have
#define xmlDictComputeKey(dict, name, len) \
(((dict)->size == MIN_DICT_SIZE) ? \
xmlDictComputeFastKey(name, len) : \
xmlDictComputeBigKey(name, len, len)) \
#define xmlDictComputeQKey(dict, prefix, name, len) \
(((prefix) == NULL) ? \
(xmlDictComputeKey(dict, name, len)) : \
(((dict)->size == MIN_DICT_SIZE) ? \
xmlDictComputeFastQKey(prefix, name, len) : \
xmlDictComputeBigKey(prefix, xmlStrlen(prefix), \
xmlDictComputeBigKey(name, len, len))))
Anyway:
the problem is that basically if you compute the key for a QName
as "a:b" you can get 2 different answers, one if you accessed it using
"a:b" directly and hence xmlDictComputeKey() or if using "a" prefix and
"b" name, given the algorithm the key are not the same, and it is a key
property of the dictionary to always return the same exact pointer for
the same string. This breaks that property.
True, I didn't know about this property. And the 4-byte-at-once property will
really make this very hard to achieve.
yes, that's the core of the problem
A way I see to fix this is to search the string for the first ':' and always
calculate the hash separately for the part before and after the ':', not
including the ':' itself. That should not break hashing namespace URIs either
(AFAIR, at the least the XML namespace gets hashed at some point). Something like
int len = strlen(s)
char* prefix_end = strchr(s, ':')
if (prefix_end)
h = xmlDictComputeBigKey(s, prefix_end-s,
xmlDictComputeBigKey(prefix_end+1, len-(prefix_end-s-1),
len-(prefix_end-s-1)))
else
h = xmlDictComputeBigKey(s, len, len)
Another option I looked at is the 'One-at-a-Time Hash' from
http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias
and the results it looks like a good hash too, not too expensive and
should work well. I will try to make a patch using this this morning,
if you have a bit of time then, maybe you can rerun your initial tests
with that one, is that possible ?
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]