Re: [xml] Exponential running time in RELAX NG matching of optional attributes



On Mon, Jul 22, 2019 at 11:59:42AM +0200, Jan Pokorný via xml wrote:
On 27/06/19 14:15 +0200, Nikolai Weibull via xml wrote:
The following RELAX NG schema requires exponential running time when
matching attributes (15 attributes is where my computer begins to
show the symptoms, this may vary with your set-up):

a.rng:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0";>
  <start>
    <element>
      <name>a</name>
      <group>
        <optional><attribute><name ns="">a</name><text/></attribute></optional>
        <optional><attribute><name ns="">b</name><text/></attribute></optional>
        <optional><attribute><name ns="">c</name><text/></attribute></optional>
        <optional><attribute><name ns="">d</name><text/></attribute></optional>
        <optional><attribute><name ns="">e</name><text/></attribute></optional>
        <optional><attribute><name ns="">f</name><text/></attribute></optional>
        <optional><attribute><name ns="">g</name><text/></attribute></optional>
        <optional><attribute><name ns="">h</name><text/></attribute></optional>
        <optional><attribute><name ns="">i</name><text/></attribute></optional>
        <optional><attribute><name ns="">j</name><text/></attribute></optional>
        <optional><attribute><name ns="">k</name><text/></attribute></optional>
        <optional><attribute><name ns="">l</name><text/></attribute></optional>
        <optional><attribute><name ns="">m</name><text/></attribute></optional>
        <optional><attribute><name ns="">n</name><text/></attribute></optional>
        <optional><attribute><name ns="">o</name><text/></attribute></optional>
      </group>
    </element>
  </start>
</grammar>

a.xml:

<?xml version="1.0" encoding="UTF-8"?>
<a a="1" b="2" c="3" d="4" e="5" f="6" g="7" h="8" i="9" j="10" k="11" l="12" m="13" n="14" o="15"/>

% time xmllint --noout --relaxng a.rng a.xml
real: 3.175, user: 3.147, system: 0.014 (99%)

Changing the group to an interleave, that is,

b.rng:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0";>
  <start>
    <element>
      <name>a</name>
      <interleave>
        <optional><attribute><name ns="">a</name><text/></attribute></optional>
        <optional><attribute><name ns="">b</name><text/></attribute></optional>
        <optional><attribute><name ns="">c</name><text/></attribute></optional>
        <optional><attribute><name ns="">d</name><text/></attribute></optional>
        <optional><attribute><name ns="">e</name><text/></attribute></optional>
        <optional><attribute><name ns="">f</name><text/></attribute></optional>
        <optional><attribute><name ns="">g</name><text/></attribute></optional>
        <optional><attribute><name ns="">h</name><text/></attribute></optional>
        <optional><attribute><name ns="">i</name><text/></attribute></optional>
        <optional><attribute><name ns="">j</name><text/></attribute></optional>
        <optional><attribute><name ns="">k</name><text/></attribute></optional>
        <optional><attribute><name ns="">l</name><text/></attribute></optional>
        <optional><attribute><name ns="">m</name><text/></attribute></optional>
        <optional><attribute><name ns="">n</name><text/></attribute></optional>
        <optional><attribute><name ns="">o</name><text/></attribute></optional>
      </interleave>
    </element>
  </start>
</grammar>

gives

% time xmllint --noout --relaxng b.rng a.xml
real: 0.008, user: 0.003, system: 0.002 (54%)

Making the attributes required instead of optional also fixes the issue:

c.rng:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0";>
  <start>
    <element>
      <name>a</name>
      <group>
        <attribute><name ns="">a</name><text/></attribute>
        <attribute><name ns="">b</name><text/></attribute>
        <attribute><name ns="">c</name><text/></attribute>
        <attribute><name ns="">d</name><text/></attribute>
        <attribute><name ns="">e</name><text/></attribute>
        <attribute><name ns="">f</name><text/></attribute>
        <attribute><name ns="">g</name><text/></attribute>
        <attribute><name ns="">h</name><text/></attribute>
        <attribute><name ns="">i</name><text/></attribute>
        <attribute><name ns="">j</name><text/></attribute>
        <attribute><name ns="">k</name><text/></attribute>
        <attribute><name ns="">l</name><text/></attribute>
        <attribute><name ns="">m</name><text/></attribute>
        <attribute><name ns="">n</name><text/></attribute>
        <attribute><name ns="">o</name><text/></attribute>
      </group>
    </element>
  </start>
</grammar>

% time xmllint --noout --relaxng c.rng a.xml
real: 0.008, user: 0.003, system: 0.002 (55%)

The problem seems to be that xmlRelaxNGValidateDefinition keeps
adding states that I feel don’t need checking, but I can’t figure
out how to avoid this from happening.

  Hum, I'm way behind .... sorry

yes that sounds like there is a graph generation issue, best is to compile
the regexp support with things like DEBUG_REGEXP_GRAPH enabled and trace down
with pencil and paper what kind of graph it generates. That's how most of that code
ended up being debugged <grin/>


I can only nod to your analysis, ran into similar observations
in the past, and they were likewise XML attributes (that parameters to
fence device instances boil down to) related:

https://bugzilla.redhat.com/show_bug.cgi?id=1224378#c7

Callgrind proved to be a useful tool to get better quantitative numbers.

  yup but that's way down

Daniel, do you have any input on this?  I’d gladly work on fixing
this, but I feel that I need some guidance as to what’s going wrong
and perhaps also how to fix it.

Would be nice to get libxml's RelaxNG validation support closer
to parity with jing (the only other freely available validator I am
aware of that can be directly used for a peer comparison), for sure.

  well jing use an orthogonal approach, derivating the document graph directly
while libxml2 uses validation structure derivation instead (when it can't compile
down to a regexp). In that case it should be regexp based.
  The two will performs massively differently on different problems.

Daniel

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml


-- 
Daniel Veillard      | Red Hat Developers Tools http://developer.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]