[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Schema validity failure for valid document
- From: Daniel Veillard <veillard redhat com>
- To: Kasimier Buchcik <kbuchcik 4commerce de>
- Cc: xml gnome org
- Subject: Re: [xml] Schema validity failure for valid document
- Date: Tue, 11 Jan 2005 05:41:55 -0500
On Tue, Jan 11, 2005 at 11:23:48AM +0100, Kasimier Buchcik wrote:
> > Small suggestion use {http://FOO}b as in the XPath REC, it's a shorter
> >and already well known notation for namespaced names.
>
> OK. I'll better change the format of reported qualified names in other
> places of the schema engine as well.
>
> Any preferences in wildcard reports?
> 1. Any element from the namespace "http://BAR"
> {http://BAR}* ( WC[http://BAR] in Xerces ) ?
>
> 2. Any element from a namespace other than "http://BAR"
> {##other:http://BAR}* ( WC[##other:"http://FOO"] in Xerces ) ?
>
> 3. Any element from any namespace
> {##any}* ( WC[##any] in Xerces ) ?
all 3 looks good to me. Obviously the ## stuff is schemas specific.
> (Frans, look, Xerces uses abbreviations ;-))
>
> For comparison, a Xerces report for the last example XML Schema +
> instance:
>
> (Error) cvc-complex-type.2.4.a: Invalid content was found starting with
> element 'x'. One of '{"http://FOO":b, WC[##other:"http://FOO"],
> "http://FOO":c}' is expected.
>
> I wonder if the prefix "WC" as used by Xerces could be usefull or
> irritating, since we unfold wildcards into multiple transitions, if
> they consist of multiple namespaces.
I honnestly don't understand what WC is supposed to mean. If it's
wilcard then I think * is more user friendly.
> >>This is due to the fact that negated namespaces are build using
> >>an automaton approach in xmlSchemaBuildAContentModel:
> >>
> >> deadEnd = xmlAutomataNewState(ctxt->am);
> >> ctxt->state = xmlAutomataNewTransition2(ctxt->am,
> >> start, deadEnd, BAD_CAST "*", wild->negNsSet->value, type);
> >> ctxt->state = xmlAutomataNewTransition2(ctxt->am,
> >> start, NULL, BAD_CAST "*", BAD_CAST "*", type);
> >> xmlAutomataNewEpsilon(ctxt->am, ctxt->state, end);
> >>
> >>The namespace is let through and then caught with a dead-end.
> >>
> >>Any ideas?
> >
> >
> > The problem is that we reach that "dead state", from there we can't
> >extract useful informations.
> > Sounds the reverse from reachable state. Basically when you construct
> >the automata you can build a list of dead states, i.e. any state from which
> >you can't possibly transition to a final state. We could build that list
> >and then error out (or rollback if not determinist) earlier.
> > It might be cheap to do based on the epsilon transition elimination,
> >but this may not help getting an accurate error code or message out of the
> >regexp, for example we could save as the error state "start" when going
> >though that transition to the dead state, then xmlRegExecErrInfo() would
> >extract the values from the state before going though the error transition.
> >This seems a more "formal" approach to solving that case of errors
> >extraction problems, but would that really work for you ?
>
> I'm only an ankle deep into the xmlregexp.c code, so I hope you somehow
> get grip of such a transition for the error report.
>
> But I noticed that even if we get the value-string of this transition
> through xmlRegExecErrInfo, we would still not be able to know that it
> was a negated namespace, unless the data field, i.e. the schema type
> associated with the value-string, is provided as well.
>
> What about the following:
> 1. xmlRegExecErrInfo would return an array of automaton transition
> indices (well, if the transitions are accessible through an index).
> 2. We could then iterate through all the given transitions and extract
> the info using accessor functions like:
> const xmlChar * xmlRegExecGetValue(int transitionId)
> void * xmlRegExecGetData(int transitionId)
Too complex. Depending on the compilation you may address transitions
with integers or with pointers. I don't think exposing that deep a level
is safe. Maybe we can give informations when building the automata,
associating an error string to a transition (or a state) and providing
back that optional error string from xmlRegExecErrInfo if reached. That
sounds way simpler to implement and use. When building those "dead" transition
you know what they represent and what to report if transiting though them.
I still think detecting the dead state should be added, it doesn't have
to show up in any API anyway.
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]