Re: [xml] Better hash function for dict.c
- From: Stefan Behnel <stefan_ml behnel de>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] Better hash function for dict.c
- Date: Wed, 06 Aug 2008 08:25:14 +0200
Salut Daniel,
Daniel Veillard wrote:
- the second one is unfortunately not fixeable as is it comes from the
key hash definitions themselves:
-#define xmlDictComputeKey(dict, name, len) \
- (((dict)->size == MIN_DICT_SIZE) ? \
- xmlDictComputeFastKey(name, len) : \
- xmlDictComputeBigKey(name, len, len)) \
-
-#define xmlDictComputeQKey(dict, prefix, name, len) \
- (((prefix) == NULL) ? \
- (xmlDictComputeKey(dict, name, len)) : \
- (((dict)->size == MIN_DICT_SIZE) ? \
- xmlDictComputeFastQKey(prefix, name, len) : \
- xmlDictComputeBigKey(name, len, len))))
Hmm, was that in my patch? Out of the top of my head, shouldn't the last line read
xmlDictComputeBigKey(prefix, -1, xmlDictComputeBigKey(name, len, len))))
or something in that line? This looks like a copy&paste error to me...
Anyway:
the problem is that basically if you compute the key for a QName
as "a:b" you can get 2 different answers, one if you accessed it using
"a:b" directly and hence xmlDictComputeKey() or if using "a" prefix and
"b" name, given the algorithm the key are not the same, and it is a key
property of the dictionary to always return the same exact pointer for
the same string. This breaks that property.
True, I didn't know about this property. And the 4-byte-at-once property will
really make this very hard to achieve.
A way I see to fix this is to search the string for the first ':' and always
calculate the hash separately for the part before and after the ':', not
including the ':' itself. That should not break hashing namespace URIs either
(AFAIR, at the least the XML namespace gets hashed at some point). Something like
int len = strlen(s)
char* prefix_end = strchr(s, ':')
if (prefix_end)
h = xmlDictComputeBigKey(s, prefix_end-s,
xmlDictComputeBigKey(prefix_end+1, len-(prefix_end-s-1),
len-(prefix_end-s-1)))
else
h = xmlDictComputeBigKey(s, len, len)
(expect an off-by-1 error somewhere above ;)
Stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]