[xml] libxml2 RelaxNG validation performance



Hello,

I am fairly new to libxml2 and had a question about the RelaxNG validation performance.

I am using trang to generate a RelaxNG schema for some XML and then hand editing that to create the real 
specification to use for validation.  When running xmllint to validate the document against the schema, the 
performance is extremely slow.  It takes over 30 minutes to validate a 60KB XML document against a 17KB 
RelaxNG schema.

I put together a small test case that reproduces the problem.  Notice how the XML Schema validation finishes 
pretty quickly, but the RelaxNG validation takes over 7 seconds.

The sample script that generates this output is below.  Below that I attached some OProfile performance 
captures.  Most of the time is spent inside xmlRelaxNGValidateAttribute(), which isn't suprising considering 
the XML document that gets fed to it.

Is this a known problem with libxml2 or a problem on my end?


Sample output from my script:

  ======================================== non validating
  Parsing took 1 ms
  Freeing took 0 ms

  ======================================== xsd
  Compiling the schemas took 13 ms
  Parsing took 1 ms
  slow.xml validates
  Validating took 8 ms
  Freeing took 0 ms

  ======================================== rng
  Compiling the schemas took 91 ms
  Parsing took 1 ms
  slow.xml validates
  Validating took 7309 ms
  Freeing took 0 ms

This is on a Debian Pentium 4 machine with the following libxml2 packages.

ii  libxml2                           2.6.27.dfsg-1                       GNOME XML library
ii  libxml2-dbg                       2.6.27.dfsg-1                       Debugging symbols for the GNOME XML 
library
ii  libxml2-dev                       2.6.27.dfsg-1                       Development files for the GNOME XML 
library
ii  libxml2-doc                       2.6.27.dfsg-1                       Documentation for the GNOME XML 
library
ii  libxml2-utils                     2.6.27.dfsg-1                       XML utilities

============================================================

Here is the sample script that reproduces this problem.

============================================================

#!/usr/bin/env python

import os
import random

NODES = 100
ATTRS = 10

def randword():
    word = ''
    for w in range(0, random.randint(2, 10)):
        word += random.choice('abcdefghijklmnopqrstuvwxyz')
    return word

f = open("slow.xml", 'w')
f.write('<?xml version="1.0"?>\n')
f.write('<slowroot>\n')

random.seed(0)
for i in range(0, NODES):
    f.write('<slownode')

    keys = {}
    for j in range(0, ATTRS):
        key = randword()

        # Drop duplicate keys
        if key in keys:
            continue
        keys[key] = None

        value = randword()

        f.write(' ' + key + '="' + value + '"')
    f.write('/>\n')

f.write('</slowroot>\n')
del f

os.system('trang slow.xml slow.xsd')
os.system('trang slow.xml slow.rng')

print '='*40, 'non validating'
os.system('xmllint --noout --timing slow.xml')

print '='*40, 'xsd'
os.system('xmllint --noout --timing --schema slow.xsd slow.xml')

print '='*40, 'rng'
os.system('xmllint --noout --timing --relaxng slow.rng slow.xml')

============================================================

And here is an OProfile trace of xmllint running:

============================================================

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.14 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 
(mandatory) count 100000
GLOBAL_POWER_E...|
  samples|      %|
------------------
   110614 100.000 xmllint
        GLOBAL_POWER_E...|
          samples|      %|
        ------------------
           109842 99.3021 libxml2.so.2.6.27
              748  0.6762 libc-2.3.6.so
               21  0.0190 ld-2.3.6.so
                1 9.0e-04 xmllint
                1 9.0e-04 anon (tgid:26130 range:0x8055000-0x81ab000)
                1 9.0e-04 libreadline.so.5.2

============================================================

Here is a flat list of the top 1% in OProfile:

============================================================

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.14 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 
(mandatory) count 100000
samples  %        image name               symbol name
33608    30.3831  libxml2.so.2.6.27        xmlRelaxNGValidateAttribute
19206    17.3631  libxml2.so.2.6.27        xmlRelaxNGCopyValidState
16978    15.3489  libxml2.so.2.6.27        xmlRelaxNGValidateState
13741    12.4225  libxml2.so.2.6.27        xmlRelaxNGAddStates
10863     9.8206  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
7734      6.9919  libxml2.so.2.6.27        xmlStrEqual
2538      2.2945  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList
1678      1.5170  libxml2.so.2.6.27        xmlRelaxNGFreeValidState
1554      1.4049  libxml2.so.2.6.27        __i686.get_pc_thunk.bx

============================================================

And an OProfile callstack dump of the top 1%:

============================================================

Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 
(mandatory) count 100000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
  39102    99.5088  libxml2.so.2.6.27        xmlRelaxNGValidateState
33608    30.3831  libxml2.so.2.6.27        xmlRelaxNGValidateAttribute
  33608    85.3298  libxml2.so.2.6.27        xmlRelaxNGValidateAttribute [self]
  5471     13.8907  libxml2.so.2.6.27        xmlStrEqual
-------------------------------------------------------------------------------
  468       2.4360  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
  18744    97.5640  libxml2.so.2.6.27        xmlRelaxNGValidateState
19206    17.3631  libxml2.so.2.6.27        xmlRelaxNGCopyValidState
  19206    99.9688  libxml2.so.2.6.27        xmlRelaxNGCopyValidState [self]
-------------------------------------------------------------------------------
  124256   99.5250  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
16978    15.3489  libxml2.so.2.6.27        xmlRelaxNGValidateState
  58970    32.0815  libxml2.so.2.6.27        xmlRelaxNGValidateAttributeList
  45574    24.7937  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList
  39102    21.2727  libxml2.so.2.6.27        xmlRelaxNGValidateAttribute
  18744    10.1973  libxml2.so.2.6.27        xmlRelaxNGCopyValidState
  16978     9.2366  libxml2.so.2.6.27        xmlRelaxNGValidateState [self]
  1848      1.0054  libxml2.so.2.6.27        xmlStrEqual
-------------------------------------------------------------------------------
  13666    99.4542  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
13741    12.4225  libxml2.so.2.6.27        xmlRelaxNGAddStates
  13741    100.000  libxml2.so.2.6.27        xmlRelaxNGAddStates [self]
-------------------------------------------------------------------------------
  8573      7.2250  libxml2.so.2.6.27        xmlRelaxNGValidateCompiledCallback
  48267    40.6774  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList
  61594    51.9088  libxml2.so.2.6.27        xmlRelaxNGValidateAttributeList
10863     9.8206  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
  124256   82.5989  libxml2.so.2.6.27        xmlRelaxNGValidateState
  13666     9.0844  libxml2.so.2.6.27        xmlRelaxNGAddStates
  10863     7.2212  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition [self]
-------------------------------------------------------------------------------
  187       2.4179  libxml2.so.2.6.27        xmlHashLookup3
  1848     23.8945  libxml2.so.2.6.27        xmlRelaxNGValidateState
  5471     70.7396  libxml2.so.2.6.27        xmlRelaxNGValidateAttribute
7734      6.9919  libxml2.so.2.6.27        xmlStrEqual
  7734     100.000  libxml2.so.2.6.27        xmlStrEqual [self]
-------------------------------------------------------------------------------
  45574    99.5870  libxml2.so.2.6.27        xmlRelaxNGValidateState
2538      2.2945  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList
  48267    94.0932  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
  2538      4.9477  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList [self]
-------------------------------------------------------------------------------
  332      19.7855  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
  1346     80.2145  libxml2.so.2.6.27        xmlRelaxNGValidateState
1678      1.5170  libxml2.so.2.6.27        xmlRelaxNGFreeValidState
  1678     100.000  libxml2.so.2.6.27        xmlRelaxNGFreeValidState [self]
-------------------------------------------------------------------------------
  186      11.9691  libxml2.so.2.6.27        xmlRelaxNGValidateDefinitionList
  452      29.0862  libxml2.so.2.6.27        xmlRelaxNGValidateDefinition
  874      56.2420  libxml2.so.2.6.27        xmlRelaxNGValidateState
1554      1.4049  libxml2.so.2.6.27        __i686.get_pc_thunk.bx
  1554     100.000  libxml2.so.2.6.27        __i686.get_pc_thunk.bx [self]
-------------------------------------------------------------------------------






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]