Re: [xml] Schema validity failure for valid document
- From: Daniel Veillard <veillard redhat com>
- To: Kasimier Buchcik <kbuchcik 4commerce de>
- Cc: xml gnome org
- Subject: Re: [xml] Schema validity failure for valid document
- Date: Tue, 11 Jan 2005 05:41:55 -0500
On Tue, Jan 11, 2005 at 11:23:48AM +0100, Kasimier Buchcik wrote:
 Small suggestion use {http://FOO}b as in the XPath REC, it's a shorter
and already well known notation for namespaced names.
OK. I'll better change the format of reported qualified names in other
places of the schema engine as well.
Any preferences in wildcard reports?
  1. Any element from the namespace "http://BAR"
    {http://BAR}* ( WC[http://BAR] in Xerces ) ?
  2. Any element from a namespace other than "http://BAR"
    {##other:http://BAR}* ( WC[##other:"http://FOO"] in Xerces ) ?
  3. Any element from any namespace
    {##any}* ( WC[##any] in Xerces ) ?
   all 3 looks good to me. Obviously the ## stuff is schemas specific.
(Frans, look, Xerces uses abbreviations ;-))
For comparison, a Xerces report for the last example XML Schema +
instance:
(Error) cvc-complex-type.2.4.a: Invalid content was found starting with
element 'x'. One of '{"http://FOO":b, WC[##other:"http://FOO"],
"http://FOO":c}' is expected.
I wonder if the prefix "WC" as used by Xerces could be usefull or
irritating, since we unfold wildcards into multiple transitions, if
they consist of multiple namespaces.
  I honnestly don't understand what WC is supposed to mean. If it's 
wilcard then I think * is more user friendly.
This is due to the fact that negated namespaces are build using
an automaton approach in xmlSchemaBuildAContentModel:
deadEnd = xmlAutomataNewState(ctxt->am);
ctxt->state = xmlAutomataNewTransition2(ctxt->am,
  start, deadEnd, BAD_CAST "*", wild->negNsSet->value, type);
ctxt->state = xmlAutomataNewTransition2(ctxt->am,
  start, NULL, BAD_CAST "*", BAD_CAST "*", type);
xmlAutomataNewEpsilon(ctxt->am, ctxt->state, end);
The namespace is let through and then caught with a dead-end.
Any ideas?
  The problem is that we reach that "dead state", from there we can't
extract useful informations.
  Sounds the reverse from reachable state. Basically when you construct
the automata you can build a list of dead states, i.e. any state from which
you can't possibly transition to a final state. We could build that list
and then error out (or rollback if not determinist) earlier.
  It might be cheap to do based on the epsilon transition elimination, 
but this may not help getting an accurate error code or message out of the
regexp, for example we could save as the error state "start" when going
though that transition to the dead state, then xmlRegExecErrInfo() would
extract the values from the state before going though the error transition.
This seems a more "formal" approach to solving that case of errors
extraction problems, but would that really work for you ?
I'm only an ankle deep into the xmlregexp.c code, so I hope you somehow
get grip of such a transition for the error report.
But I noticed that even if we get the value-string of this transition
through xmlRegExecErrInfo, we would still not be able to know that it
was a negated namespace, unless the data field, i.e. the schema type
associated with the value-string, is provided as well.
What about the following:
1. xmlRegExecErrInfo would return an array of automaton transition
  indices (well, if the transitions are accessible through an index).
2. We could then iterate through all the given transitions and extract
   the info using accessor functions like:
   const xmlChar * xmlRegExecGetValue(int transitionId)
   void * xmlRegExecGetData(int transitionId)
  Too complex. Depending on the compilation you may address transitions 
with integers or with pointers. I don't think exposing that deep a level
is safe. Maybe we can give informations when building the automata,
associating an error string to a transition (or a state) and providing
back that optional error string from xmlRegExecErrInfo if reached. That
sounds way simpler to implement and use. When building those "dead" transition
you know what they represent and what to report if transiting though them.
  I still think detecting the dead state should be added, it doesn't have
to show up in any API anyway.
Daniel
-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]