Re: [xml] Schema validity failure for valid document



Hi,

Daniel Veillard wrote:
On Mon, Jan 10, 2005 at 06:44:15PM +0100, Kasimier Buchcik wrote:

I tried an initial implementation.
There is a problem with negated namespaces in wildcards:

[...]

P:\libxml2-lab\tests\2005-01-10>xmllint --noout --schema errRep1.xsd errRep1_0.xml
DEBUG terminal: 0
DEBUG nbval: 3
Element 'x': This element is not expected. Expected is one of { ("http://FOO";, "b"), ("http://BAR";, any), ("http://FOO";, "c") }.
errRep1_0.xml fails to validate


  Small suggestion use {http://FOO}b as in the XPath REC, it's a shorter
and already well known notation for namespaced names.

OK. I'll better change the format of reported qualified names in other
places of the schema engine as well.

Any preferences in wildcard reports?
  1. Any element from the namespace "http://BAR";
    {http://BAR}* ( WC[http://BAR] in Xerces ) ?

  2. Any element from a namespace other than "http://BAR";
    {##other:http://BAR}* ( WC[##other:"http://FOO";] in Xerces ) ?

  3. Any element from any namespace
    {##any}* ( WC[##any] in Xerces ) ?

(Frans, look, Xerces uses abbreviations ;-))

For comparison, a Xerces report for the last example XML Schema +
instance:

(Error) cvc-complex-type.2.4.a: Invalid content was found starting with
element 'x'. One of '{"http://FOO":b, WC[##other:"http://FOO";],
"http://FOO":c}' is expected.

I wonder if the prefix "WC" as used by Xerces could be usefull or
irritating, since we unfold wildcards into multiple transitions, if
they consist of multiple namespaces.

  Otherwise looks cool ! How to retrofit those new dynamic error codes
in the __xmlRaiseError framework may be a bit challenging, maybe separate

I don't know how to do it. Until now the string is just painted on the
buffer canvas and spit out with one single error code :-) There's a lot
of xmlStrdup and xmlStrcat in all the error report functions. This is
not very performant, but eased reporting what I wanted; Hope that
someone can review the code and change/give some advice.

the translatable strings from the potential values which are language
independant.

Interesting.

This is due to the fact that negated namespaces are build using
an automaton approach in xmlSchemaBuildAContentModel:

 deadEnd = xmlAutomataNewState(ctxt->am);
 ctxt->state = xmlAutomataNewTransition2(ctxt->am,
   start, deadEnd, BAD_CAST "*", wild->negNsSet->value, type);
 ctxt->state = xmlAutomataNewTransition2(ctxt->am,
   start, NULL, BAD_CAST "*", BAD_CAST "*", type);
 xmlAutomataNewEpsilon(ctxt->am, ctxt->state, end);

The namespace is let through and then caught with a dead-end.

Any ideas?


   The problem is that we reach that "dead state", from there we can't
extract useful informations.
   Sounds the reverse from reachable state. Basically when you construct
the automata you can build a list of dead states, i.e. any state from which
you can't possibly transition to a final state. We could build that list
and then error out (or rollback if not determinist) earlier.
It might be cheap to do based on the epsilon transition elimination, but this may not help getting an accurate error code or message out of the
regexp, for example we could save as the error state "start" when going
though that transition to the dead state, then xmlRegExecErrInfo() would
extract the values from the state before going though the error transition.
This seems a more "formal" approach to solving that case of errors
extraction problems, but would that really work for you ?

I'm only an ankle deep into the xmlregexp.c code, so I hope you somehow
get grip of such a transition for the error report.

But I noticed that even if we get the value-string of this transition
through xmlRegExecErrInfo, we would still not be able to know that it
was a negated namespace, unless the data field, i.e. the schema type
associated with the value-string, is provided as well.

What about the following:
1. xmlRegExecErrInfo would return an array of automaton transition
  indices (well, if the transitions are accessible through an index).
2. We could then iterate through all the given transitions and extract
   the info using accessor functions like:
   const xmlChar * xmlRegExecGetValue(int transitionId)
   void * xmlRegExecGetData(int transitionId)

One workaround I see, would be to add a special negating character to
the namespace name if calling xmlAutomataNewTransition2, and augment
xmlRegStrEqualWildcard to handle negations:
Example:
 "*|~http://FOO"; - the tilde indicates a negation

But this would be a non-automaton approach, plus I don't know if it's
too hackish.


   I would prefer keeping an automata based approach as much as possible.
   Maybe the simplest is to add some informative error information to
such dead state when creating them, keep this in the xmlRegexp and extract
them in the xmlRegExecErrInfo() routine, but it may as well be an ugly
workaround and not formal enough.

OK.

Regards,

Kasimier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]