Re: [xml] Better hash function for dict.c



Stefan Behnel wrote:
Daniel Veillard wrote:
if you have a bit of time then, maybe you can rerun your initial tests
with that one, is that possible ?
I can try, sure. Just send me a patch that removes the current hash
function from SVN and adds the new one, and I will find a way to compare
the two.
Here's a little test script that runs "xmllint --noout" on a generated XML
file with varying numbers of distinct tag names, together with the numbers I
get. It looks like the new hash is a little slower than the one from my
original patch. At least, I get slightly lower throughput, but it's less than
10% difference throughout, so I guess it's within the usual margin. This is
likely due to the 4-byte reads of the other hash.

The distribution seems to be about comparable, and the timings stay more or
less constant over the range I tested (up to 1000 entries). Even with 2000
entries in the dict, the timings are only 15% lower than with 8, so I would
say this hash works just as well as the other one.

I did a quick check with lxml's benchmarks and they give me comparable
results: slightly slower, but about the same behavioural improvement.

Given that the new hash gives correct results, which the other one didn't, I'm
fine with the change. The price is definitely low enough.

Stefan
import sys, os
from time import time

def gen_xml(tag_count):
    data = ["<root>"]

    append = data.append
    for i in range(100000):
        append( "<a%04dbc />" % (i%tag_count) )

    data.append("</root>")
    return '\n '.join(data)

for i in range(8, 2001, 8):
    xml = gen_xml(i)
    f = open("file.xml", 'wb')
    f.write(xml)
    f.close()

    
    t = time()
    os.system("xmllint --noout file.xml")
    t = time() - t

    print "%4d %5.2f %8.1f" % (i, t, len(xml)/(t*1000.0))
benchmarking libxml2 2.6.32: xmllint --noout file.xml

Columns:

 #distincs tags | seconds | kbytes/sec

      original hash      one-at-a-time hash
--------------------------------------------
   8  0.10  13292.6        8  0.10  13647.7
  16  0.10  12943.7       16  0.10  13352.1
  24  0.10  12748.5       24  0.10  12856.2
  32  0.11  12300.0       32  0.10  12623.1
  40  0.11  12344.8       40  0.10  12461.8
  48  0.11  12060.3       48  0.11  12246.8
  56  0.12  10967.4       56  0.10  12956.2
  64  0.12  10807.2       64  0.10  13011.0
  72  0.12  10446.1       72  0.10  12779.8
  80  0.12  10824.6       80  0.10  12727.4
  88  0.12  10601.9       88  0.10  12812.4
  96  0.12  10632.7       96  0.10  12926.0
 104  0.13  10398.9      104  0.10  12909.6
 112  0.12  10528.6      112  0.10  13047.2
 120  0.13  10389.8      120  0.10  12824.4
 128  0.13  10249.2      128  0.10  12976.6
 136  0.13  10191.4      136  0.11  12141.6
 144  0.13   9958.5      144  0.10  12692.1
 152  0.13   9672.7      152  0.10  12806.0
 160  0.13   9740.6      160  0.10  12735.6
 168  0.13   9680.3      168  0.10  12705.4
 176  0.13   9664.8      176  0.11  12009.0
 184  0.14   9061.4      184  0.10  12624.2
 192  0.14   9558.7      192  0.10  12779.9
 200  0.14   9223.1      200  0.10  12772.7
 208  0.14   9409.3      208  0.10  12506.9
 216  0.14   9238.8      216  0.10  12461.3
 224  0.14   9239.2      224  0.10  12691.9
 232  0.14   8991.2      232  0.10  12540.5
 240  0.14   8981.5      240  0.10  12658.3
 248  0.15   8875.0      248  0.10  12621.1
 256  0.15   8813.3      256  0.10  12754.4
 264  0.15   8719.4      264  0.10  12575.4
 272  0.15   8638.8      272  0.10  12598.2
 280  0.15   8587.4      280  0.10  12493.8
 288  0.15   8594.9      288  0.10  12773.3
 296  0.15   8531.2      296  0.10  12630.3
 304  0.15   8559.2      304  0.10  12668.1
 312  0.15   8439.1      312  0.10  12836.0
 320  0.16   8354.5      320  0.10  12563.9
 328  0.16   8177.6      328  0.10  12526.3
 336  0.16   8154.6      336  0.10  12559.2
 344  0.16   8135.8      344  0.10  12629.0
 352  0.16   8014.6      352  0.10  12608.3
 360  0.17   7791.0      360  0.10  12762.5
 368  0.16   7905.4      368  0.10  12736.4
 376  0.17   7817.3      376  0.10  12472.8
 384  0.17   7752.3      384  0.10  12526.0
 392  0.17   7739.4      392  0.10  12611.3
 400  0.17   7762.5      400  0.10  12619.7
 408  0.17   7688.5      408  0.10  12530.9
 416  0.17   7625.4      416  0.10  12606.7
 424  0.17   7526.3      424  0.10  12463.5
 432  0.17   7443.6      432  0.10  12702.9
 440  0.18   7151.1      440  0.10  12605.0
 448  0.18   7321.2      448  0.10  12436.2
 456  0.18   7213.5      456  0.11  12277.0
 464  0.18   7141.5      464  0.11  11654.0
 472  0.18   7109.2      472  0.10  12552.4
 480  0.18   7118.3      480  0.10  12437.0
 488  0.18   7078.0      488  0.10  12632.3
 496  0.18   7095.4      496  0.10  12657.4
 504  0.18   7039.9      504  0.10  12609.9
 512  0.19   6983.3      512  0.10  12565.8
 520  0.19   6897.1      520  0.11  11936.7
 528  0.20   6585.0      528  0.10  12668.5
 536  0.19   6765.7      536  0.10  12682.3
 544  0.19   6781.3      544  0.10  12434.1
 552  0.20   6470.6      552  0.10  12423.7
 560  0.20   6642.9      560  0.10  12640.5
 568  0.20   6553.0      568  0.11  12168.3
 576  0.20   6596.0      576  0.11  12357.9
 584  0.20   6544.2      584  0.10  12497.8
 592  0.20   6603.2      592  0.10  12492.1
 600  0.20   6609.9      600  0.11  12178.8
 608  0.20   6541.3      608  0.11  12317.2
 616  0.20   6488.2      616  0.10  12520.7
 624  0.20   6387.7      624  0.10  12477.7
 632  0.20   6378.9      632  0.10  12488.5
 640  0.21   6309.3      640  0.10  12500.9
 648  0.21   6299.9      648  0.11  11417.7
 656  0.22   6035.8      656  0.11  11398.9
 664  0.21   6185.9      664  0.11  11601.5
 672  0.21   6160.5      672  0.12  11036.0
 680  0.21   6133.9      680  0.11  11474.2
 688  0.21   6133.6      688  0.12  11266.4
 696  0.21   6142.9      696  0.10  12389.5
 704  0.21   6118.7      704  0.10  12484.0
 712  0.21   6158.5      712  0.10  12548.2
 720  0.21   6058.7      720  0.10  12653.2
 728  0.22   6008.4      728  0.11  12250.0
 736  0.22   5926.5      736  0.10  12483.8
 744  0.22   5945.8      744  0.10  12381.6
 752  0.22   5896.4      752  0.10  12493.3
 760  0.22   5885.5      760  0.10  12504.5
 768  0.22   5781.5      768  0.10  12530.9
 776  0.22   5845.1      776  0.10  12481.0
 784  0.22   5839.0      784  0.11  12330.0
 792  0.22   5826.5      792  0.11  12250.1
 800  0.22   5858.1      800  0.10  12451.3
 808  0.23   5741.0      808  0.10  12455.6
 816  0.23   5770.9      816  0.10  12490.1
 824  0.23   5713.3      824  0.10  12516.4
 832  0.23   5667.6      832  0.11  12337.9
 840  0.23   5641.0      840  0.10  12401.3
 848  0.23   5596.2      848  0.10  12439.2
 856  0.23   5577.5      856  0.10  12396.1
 864  0.23   5540.0      864  0.11  12307.0
 872  0.24   5530.2      872  0.11  12228.5
 880  0.24   5521.1      880  0.10  12394.8
 888  0.23   5567.8      888  0.11  12375.2
 896  0.24   5531.3      896  0.10  12426.5
 904  0.23   5567.7      904  0.10  12421.9
 912  0.24   5406.3      912  0.10  12419.7
 920  0.24   5508.4      920  0.10  12449.6
 928  0.24   5409.7      928  0.10  12385.3
 936  0.24   5428.5      936  0.11  12376.8
 944  0.24   5321.1      944  0.10  12430.6
 952  0.24   5345.0      952  0.11  12243.5
 960  0.25   5290.6      960  0.10  12424.3
 968  0.24   5335.1      968  0.11  12351.3
 976  0.25   5238.3      976  0.10  12445.9
 984  0.25   5288.2      984  0.11  12163.3
 992  0.24   5336.2      992  0.11  12107.0
1000  0.25   5166.2     1000  0.11  12292.3


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]