[Tracker] Tracker performance testing

From: Mark Doffman <mark doffman codethink co uk>
To: tracker-list gnome org
Subject: [Tracker] Tracker performance testing
Date: Wed, 14 Apr 2010 23:09:41 +0100
Hello everyone,

As an exercise in time-wasting I undertook to complete some performance
tests on the Tracker RDF database (Tracker store).

The standard tests I decided to use were those of the Berlin SPARQL
benchmarks. The specification for these tests can be found at:

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html

Results
-------

The headline result is that tracker is roughly 9x faster than virtuoso
at performing the queries in these tests.

Tracker:

Query 12 and 9 were not run due to missing SPARQL features in tracker.

Scale factor:           1000
Number of warmup runs:  128
Number of clients:      4
Seed:                   808080
Number of query mix runs (without warmups): 1024 times
min/max Querymix runtime: 3.8705s / 7.4565s
Total runtime (sum):    5573.465 seconds
Total actual runtime:   1397.849 seconds
QMpH:                   2637.19 query mixes per hour
CQET:                   5.44284 seconds average runtime of query mix
CQET (geom.):           5.42221 seconds geometric mean runtime of query
mix

Metrics for Query:      1
Count:                  1024 times executed in whole run
AQET:                   0.166173 seconds (arithmetic mean)
AQET(geom.):            0.144697 seconds (geometric mean)
QPS:                    23.99 Queries per second
minQET/maxQET:          0.06346957s / 1.97053821s
Average result count:   0.57
min/max result count:   0 / 5
Number of timeouts:     0

Metrics for Query:      2
Count:                  6144 times executed in whole run
AQET:                   0.207744 seconds (arithmetic mean)
AQET(geom.):            0.134743 seconds (geometric mean)
QPS:                    19.19 Queries per second
minQET/maxQET:          0.04699750s / 2.55192438s
Average result count:   20.47
min/max result count:   8 / 41
Number of timeouts:     0

Metrics for Query:      3
Count:                  1024 times executed in whole run
AQET:                   0.241528 seconds (arithmetic mean)
AQET(geom.):            0.162570 seconds (geometric mean)
QPS:                    16.51 Queries per second
minQET/maxQET:          0.06215394s / 2.08519456s
Average result count:   0.34
min/max result count:   0 / 4
Number of timeouts:     0

Metrics for Query:      4
Count:                  1024 times executed in whole run
AQET:                   0.252148 seconds (arithmetic mean)
AQET(geom.):            0.192841 seconds (geometric mean)
QPS:                    15.81 Queries per second
minQET/maxQET:          0.10444709s / 2.21926131s
Average result count:   0.00
min/max result count:   0 / 1
Number of timeouts:     0

Metrics for Query:      5
Count:                  1024 times executed in whole run
AQET:                   1.238432 seconds (arithmetic mean)
AQET(geom.):            1.124482 seconds (geometric mean)
QPS:                    3.22 Queries per second
minQET/maxQET:          0.36639510s / 4.12935953s
Average result count:   3.25
min/max result count:   0 / 5
Number of timeouts:     0

Metrics for Query:      6
Count:                  1024 times executed in whole run
AQET:                   0.230515 seconds (arithmetic mean)
AQET(geom.):            0.167703 seconds (geometric mean)
QPS:                    17.30 Queries per second
minQET/maxQET:          0.07234605s / 2.18394154s
Average result count:   1.00
min/max result count:   1 / 1
Number of timeouts:     0

Metrics for Query:      7
Count:                  4096 times executed in whole run
AQET:                   0.286743 seconds (arithmetic mean)
AQET(geom.):            0.176770 seconds (geometric mean)
QPS:                    13.91 Queries per second
minQET/maxQET:          0.05408344s / 2.96657009s
Average result count:   10.15
min/max result count:   1 / 28
Number of timeouts:     0

Metrics for Query:      8
Count:                  2048 times executed in whole run
AQET:                   0.224936 seconds (arithmetic mean)
AQET(geom.):            0.119174 seconds (geometric mean)
QPS:                    17.73 Queries per second
minQET/maxQET:          0.01116535s / 2.23874045s
Average result count:   0.00
min/max result count:   0 / 0
Number of timeouts:     0

Metrics for Query:      9
Count:                  0 times executed in whole run
AQET:                   0.000000 seconds (arithmetic mean)
AQET(geom.):            NaN seconds (geometric mean)
QPS:                    Infinity Queries per second
minQET/maxQET:          17976931348623157s
Average result (Bytes): 0.00
min/max result (Bytes): 2147483647 / -2147483648
Number of timeouts:     0

Metrics for Query:      10
Count:                  2048 times executed in whole run
AQET:                   0.173706 seconds (arithmetic mean)
AQET(geom.):            0.124314 seconds (geometric mean)
QPS:                    22.95 Queries per second
minQET/maxQET:          0.02448088s / 1.99127881s
Average result count:   1.18
min/max result count:   0 / 6
Number of timeouts:     0

Metrics for Query:      11
Count:                  1024 times executed in whole run
AQET:                   0.123324 seconds (arithmetic mean)
AQET(geom.):            0.097755 seconds (geometric mean)
QPS:                    32.33 Queries per second
minQET/maxQET:          0.01224387s / 1.33136896s
Average result count:   13.00
min/max result count:   13 / 13
Number of timeouts:     0

Metrics for Query:      12
Count:                  0 times executed in whole run
AQET:                   0.000000 seconds (arithmetic mean)
AQET(geom.):            NaN seconds (geometric mean)
QPS:                    Infinity Queries per second
minQET/maxQET:          17976931348623157s
Average result (Bytes): 0.00
min/max result (Bytes): 2147483647 / -2147483648
Number of timeouts:     0
78.

Virtuoso:

Benchmark run completed in 12366.220356159s

Scale factor:           1000
Number of warmup runs:  128
Number of clients:      4
Seed:                   808080
Number of query mix runs (without warmups): 1024 times
min/max Querymix runtime: 25.0356s / 60.9157s
Total runtime (sum):    49303.512 seconds
Total actual runtime:   12366.220 seconds
QMpH:                   298.10 query mixes per hour
CQET:                   48.14796 seconds average runtime of query mix
CQET (geom.):           47.96361 seconds geometric mean runtime of query
mix

Metrics for Query:      1
Count:                  1024 times executed in whole run
AQET:                   0.050511 seconds (arithmetic mean)
AQET(geom.):            0.029894 seconds (geometric mean)
QPS:                    78.93 Queries per second
minQET/maxQET:          0.00594442s / 0.35991040s
Average result count:   0.57
min/max result count:   0 / 5
Number of timeouts:     0

Metrics for Query:      2
Count:                  6144 times executed in whole run
AQET:                   5.591946 seconds (arithmetic mean)
AQET(geom.):            5.438305 seconds (geometric mean)
QPS:                    0.71 Queries per second
minQET/maxQET:          2.04874958s / 10.81485338s
Average result count:   20.47
min/max result count:   8 / 41
Number of timeouts:     0

Metrics for Query:      3
Count:                  1024 times executed in whole run
AQET:                   0.052267 seconds (arithmetic mean)
AQET(geom.):            0.031281 seconds (geometric mean)
QPS:                    76.28 Queries per second
minQET/maxQET:          0.00697979s / 1.19080248s
Average result count:   0.34
min/max result count:   0 / 4
Number of timeouts:     0

Metrics for Query:      4
Count:                  1024 times executed in whole run
AQET:                   0.106704 seconds (arithmetic mean)
AQET(geom.):            0.061361 seconds (geometric mean)
QPS:                    37.36 Queries per second
minQET/maxQET:          0.00904308s / 0.99024453s
Average result count:   0.00
min/max result count:   0 / 1
Number of timeouts:     0

Metrics for Query:      5
Count:                  1024 times executed in whole run
AQET:                   3.650117 seconds (arithmetic mean)
AQET(geom.):            3.250134 seconds (geometric mean)
QPS:                    1.09 Queries per second
minQET/maxQET:          0.63302203s / 11.57777184s
Average result count:   3.25
min/max result count:   0 / 5
Number of timeouts:     0

Metrics for Query:      6
Count:                  1024 times executed in whole run
AQET:                   0.175229 seconds (arithmetic mean)
AQET(geom.):            0.159130 seconds (geometric mean)
QPS:                    22.75 Queries per second
minQET/maxQET:          0.03786253s / 0.49990789s
Average result count:   1.05
min/max result count:   1 / 8
Number of timeouts:     0

Metrics for Query:      7
Count:                  4096 times executed in whole run
AQET:                   1.301037 seconds (arithmetic mean)
AQET(geom.):            1.074876 seconds (geometric mean)
QPS:                    3.06 Queries per second
minQET/maxQET:          0.03227636s / 5.98356504s
Average result count:   10.15
min/max result count:   1 / 28
Number of timeouts:     0

Metrics for Query:      8
Count:                  2048 times executed in whole run
AQET:                   1.472874 seconds (arithmetic mean)
AQET(geom.):            1.167677 seconds (geometric mean)
QPS:                    2.71 Queries per second
minQET/maxQET:          0.01117960s / 5.30360235s
Average result count:   4.75
min/max result count:   0 / 15
Number of timeouts:     0

Metrics for Query:      9
Count:                  4096 times executed in whole run
AQET:                   0.064159 seconds (arithmetic mean)
AQET(geom.):            0.050390 seconds (geometric mean)
QPS:                    62.14 Queries per second
minQET/maxQET:          0.01071517s / 0.73873880s
Average result (Bytes): 8299.27
min/max result (Bytes): 2578 / 13300
Number of timeouts:     0

Metrics for Query:      10
Count:                  2048 times executed in whole run
AQET:                   0.800822 seconds (arithmetic mean)
AQET(geom.):            0.513786 seconds (geometric mean)
QPS:                    4.98 Queries per second
minQET/maxQET:          0.02532729s / 2.70724267s
Average result count:   1.18
min/max result count:   0 / 6
Number of timeouts:     0

Metrics for Query:      11
Count:                  1024 times executed in whole run
AQET:                   0.082581 seconds (arithmetic mean)
AQET(geom.):            0.073997 seconds (geometric mean)
QPS:                    48.28 Queries per second
minQET/maxQET:          0.02482279s / 0.23430865s
Average result count:   10.00
min/max result count:   10 / 10
Number of timeouts:     0

Metrics for Query:      12
Count:                  1024 times executed in whole run
AQET:                   0.470700 seconds (arithmetic mean)
AQET(geom.):            0.454558 seconds (geometric mean)
QPS:                    8.47 Queries per second
minQET/maxQET:          0.17751580s / 1.10537926s
Average result (Bytes): 2608.32
min/max result (Bytes): 2565 / 2650
Number of timeouts:     0


Method
------

Benchmark Data:

A number of changes had to be made to the benchmark data before they
could be run on-top of tracker. Unfortunately the benchmarks did not
provide a machine-readable ontology, so I had to create one for myself.

The form of the data for the tests was difficult for tracker to handle.
If you look at the specifications you will see that there is a class
there called ProductType.

ProductType, they say, forms an 'irregular subsumption hierarchy'
although I think this is a posh way of saying that product types
themselves form  a class hierarchy. Put another way the data for the
tests contains resources of the form: 

bsbm-inst:ProductType011432 
rdf:type bsbm:ProductType ;
 rdfs:label "Digital Camera" ;
 rdfs:subClassOf bsbm-inst:ProductType011000
 dc:publisher bsbm-inst:StandardizationInstitution01 ;
 dc:date "2008-02-13"^^xsd:date .

This is fundamentally incompatible with the lack of runtime ontology
definition within tracker. To get around this I modified the dataset
generator to output the product types as a separate file to be placed in
the ontology folder.

The other issue with this is that although the rdfs specifications
declare that the tuple 'bsbm-inst:ProductType011432 rdfs:subClassOf
bsbm-inst:ProductType011000' implicitly declares that the subject is an
rdfs:Class this is not supported in Tracker. The tuple
'bsbm-inst:ProductType011432 a rdfs:Class' had to be added for the
ontology to function.

Linked external data is also used heavily in this data set. Tracker was
not capable of this. The external URI's were converted in to strings and
the queries changed accordingly.

foaf was another major issue. foaf was not one of the ontologies
included within tracker, and the data contained foaf elements.
Ontologies were created that somewhat represented foaf and geo. Enough
to get the bsbm data in to tracker.

Other smaller issues with the format of the data and tracker were:

1) dc:publisher is not well defined. The dataset includes URI's for the
publisher. This should possibly be allowed as the dc specifications do
not say anything about the data type of the range.

2) xsd:date is not a supported data type in tracker. These had to be
converted to dateTime. This included values of dc:date.

3) Instances of rdfs:Class are not implicitly rdfs:Resources. Looking at
the example turtle above there is no 'rdf:type rdfs:Resource'
specification I believe that this is implicit in the type specification
given, but I may be wrong. I could find nothing in the rdfs
specifications to indicate that. I simply added rdfs:Resource tuples to
the data-set.


SPARQL end point:

The tests required a HTTP SPARQL end point. This was written in a very
simple manner for tracker using django. The very basic django http
server was used for serving the requests. Apache and mod_python were
attempted, but this lead to extremely irregular results and some
failures.


Conclusion
----------

Tracker, despite being a much more lightweight rdf database compared to
virtuoso, has much better query performance in the Berlin SPARQL
benchmark. This is somewhat to be expected as tracker uses decomposed
tables whereas virtuoso, I believe, does not.

The downside to this is that virtuoso is much more flexible and much
better suited for general purpose rdf storage. The data type limits,
non-schema-free operation, and limited ontology support of tracker made
it very difficult to use for the data set provided. This is obviously
not an issue for ontologies and data that has been created specifically
for tracker.

The results for virtuoso are far more variable than the ones for
tracker. This may be intrinsic in virtuoso or could be an indication
that there is some significant query overhead that is preventing tracker
from performing well on the very simple queries. I have not attempted to
look at how much overhead the http sparql end point is adding. This is
one possible way to improve these performance tests. There are also some
queries that report slightly different numbers of results between
virtuoso and tracker. There is unlikely to be an error in tracker here.
The translation of the ontology and data set for use with tracker has
been extensive, and has likely introduced some errors.


Thanks

Mark
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]