[xml] libxml2 RelaxNG validation performance
- From: Brandon Ehle <azverkan yahoo com>
- To: xml gnome org
- Subject: [xml] libxml2 RelaxNG validation performance
- Date: Fri, 26 Jan 2007 23:06:02 -0800
Hello,
I am fairly new to libxml2 and had a question about the RelaxNG validation performance.
I am using trang to generate a RelaxNG schema for some XML and then hand editing that to create the real
specification to use for validation. When running xmllint to validate the document against the schema, the
performance is extremely slow. It takes over 30 minutes to validate a 60KB XML document against a 17KB
RelaxNG schema.
I put together a small test case that reproduces the problem. Notice how the XML Schema validation finishes
pretty quickly, but the RelaxNG validation takes over 7 seconds.
The sample script that generates this output is below. Below that I attached some OProfile performance
captures. Most of the time is spent inside xmlRelaxNGValidateAttribute(), which isn't suprising considering
the XML document that gets fed to it.
Is this a known problem with libxml2 or a problem on my end?
Sample output from my script:
======================================== non validating
Parsing took 1 ms
Freeing took 0 ms
======================================== xsd
Compiling the schemas took 13 ms
Parsing took 1 ms
slow.xml validates
Validating took 8 ms
Freeing took 0 ms
======================================== rng
Compiling the schemas took 91 ms
Parsing took 1 ms
slow.xml validates
Validating took 7309 ms
Freeing took 0 ms
This is on a Debian Pentium 4 machine with the following libxml2 packages.
ii libxml2 2.6.27.dfsg-1 GNOME XML library
ii libxml2-dbg 2.6.27.dfsg-1 Debugging symbols for the GNOME XML
library
ii libxml2-dev 2.6.27.dfsg-1 Development files for the GNOME XML
library
ii libxml2-doc 2.6.27.dfsg-1 Documentation for the GNOME XML
library
ii libxml2-utils 2.6.27.dfsg-1 XML utilities
============================================================
Here is the sample script that reproduces this problem.
============================================================
#!/usr/bin/env python
import os
import random
NODES = 100
ATTRS = 10
def randword():
word = ''
for w in range(0, random.randint(2, 10)):
word += random.choice('abcdefghijklmnopqrstuvwxyz')
return word
f = open("slow.xml", 'w')
f.write('<?xml version="1.0"?>\n')
f.write('<slowroot>\n')
random.seed(0)
for i in range(0, NODES):
f.write('<slownode')
keys = {}
for j in range(0, ATTRS):
key = randword()
# Drop duplicate keys
if key in keys:
continue
keys[key] = None
value = randword()
f.write(' ' + key + '="' + value + '"')
f.write('/>\n')
f.write('</slowroot>\n')
del f
os.system('trang slow.xml slow.xsd')
os.system('trang slow.xml slow.rng')
print '='*40, 'non validating'
os.system('xmllint --noout --timing slow.xml')
print '='*40, 'xsd'
os.system('xmllint --noout --timing --schema slow.xsd slow.xml')
print '='*40, 'rng'
os.system('xmllint --noout --timing --relaxng slow.rng slow.xml')
============================================================
And here is an OProfile trace of xmllint running:
============================================================
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.14 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01
(mandatory) count 100000
GLOBAL_POWER_E...|
samples| %|
------------------
110614 100.000 xmllint
GLOBAL_POWER_E...|
samples| %|
------------------
109842 99.3021 libxml2.so.2.6.27
748 0.6762 libc-2.3.6.so
21 0.0190 ld-2.3.6.so
1 9.0e-04 xmllint
1 9.0e-04 anon (tgid:26130 range:0x8055000-0x81ab000)
1 9.0e-04 libreadline.so.5.2
============================================================
Here is a flat list of the top 1% in OProfile:
============================================================
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.14 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01
(mandatory) count 100000
samples % image name symbol name
33608 30.3831 libxml2.so.2.6.27 xmlRelaxNGValidateAttribute
19206 17.3631 libxml2.so.2.6.27 xmlRelaxNGCopyValidState
16978 15.3489 libxml2.so.2.6.27 xmlRelaxNGValidateState
13741 12.4225 libxml2.so.2.6.27 xmlRelaxNGAddStates
10863 9.8206 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
7734 6.9919 libxml2.so.2.6.27 xmlStrEqual
2538 2.2945 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList
1678 1.5170 libxml2.so.2.6.27 xmlRelaxNGFreeValidState
1554 1.4049 libxml2.so.2.6.27 __i686.get_pc_thunk.bx
============================================================
And an OProfile callstack dump of the top 1%:
============================================================
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01
(mandatory) count 100000
samples % image name symbol name
-------------------------------------------------------------------------------
39102 99.5088 libxml2.so.2.6.27 xmlRelaxNGValidateState
33608 30.3831 libxml2.so.2.6.27 xmlRelaxNGValidateAttribute
33608 85.3298 libxml2.so.2.6.27 xmlRelaxNGValidateAttribute [self]
5471 13.8907 libxml2.so.2.6.27 xmlStrEqual
-------------------------------------------------------------------------------
468 2.4360 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
18744 97.5640 libxml2.so.2.6.27 xmlRelaxNGValidateState
19206 17.3631 libxml2.so.2.6.27 xmlRelaxNGCopyValidState
19206 99.9688 libxml2.so.2.6.27 xmlRelaxNGCopyValidState [self]
-------------------------------------------------------------------------------
124256 99.5250 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
16978 15.3489 libxml2.so.2.6.27 xmlRelaxNGValidateState
58970 32.0815 libxml2.so.2.6.27 xmlRelaxNGValidateAttributeList
45574 24.7937 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList
39102 21.2727 libxml2.so.2.6.27 xmlRelaxNGValidateAttribute
18744 10.1973 libxml2.so.2.6.27 xmlRelaxNGCopyValidState
16978 9.2366 libxml2.so.2.6.27 xmlRelaxNGValidateState [self]
1848 1.0054 libxml2.so.2.6.27 xmlStrEqual
-------------------------------------------------------------------------------
13666 99.4542 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
13741 12.4225 libxml2.so.2.6.27 xmlRelaxNGAddStates
13741 100.000 libxml2.so.2.6.27 xmlRelaxNGAddStates [self]
-------------------------------------------------------------------------------
8573 7.2250 libxml2.so.2.6.27 xmlRelaxNGValidateCompiledCallback
48267 40.6774 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList
61594 51.9088 libxml2.so.2.6.27 xmlRelaxNGValidateAttributeList
10863 9.8206 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
124256 82.5989 libxml2.so.2.6.27 xmlRelaxNGValidateState
13666 9.0844 libxml2.so.2.6.27 xmlRelaxNGAddStates
10863 7.2212 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition [self]
-------------------------------------------------------------------------------
187 2.4179 libxml2.so.2.6.27 xmlHashLookup3
1848 23.8945 libxml2.so.2.6.27 xmlRelaxNGValidateState
5471 70.7396 libxml2.so.2.6.27 xmlRelaxNGValidateAttribute
7734 6.9919 libxml2.so.2.6.27 xmlStrEqual
7734 100.000 libxml2.so.2.6.27 xmlStrEqual [self]
-------------------------------------------------------------------------------
45574 99.5870 libxml2.so.2.6.27 xmlRelaxNGValidateState
2538 2.2945 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList
48267 94.0932 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
2538 4.9477 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList [self]
-------------------------------------------------------------------------------
332 19.7855 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
1346 80.2145 libxml2.so.2.6.27 xmlRelaxNGValidateState
1678 1.5170 libxml2.so.2.6.27 xmlRelaxNGFreeValidState
1678 100.000 libxml2.so.2.6.27 xmlRelaxNGFreeValidState [self]
-------------------------------------------------------------------------------
186 11.9691 libxml2.so.2.6.27 xmlRelaxNGValidateDefinitionList
452 29.0862 libxml2.so.2.6.27 xmlRelaxNGValidateDefinition
874 56.2420 libxml2.so.2.6.27 xmlRelaxNGValidateState
1554 1.4049 libxml2.so.2.6.27 __i686.get_pc_thunk.bx
1554 100.000 libxml2.so.2.6.27 __i686.get_pc_thunk.bx [self]
-------------------------------------------------------------------------------
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]