[xml] XML validation and error reporting



Hello. I'am trying to use libxml to validate xml documents and meet some
problems. From the newbe look all is fine in libxml in this domain - 

/**
 * xmlValidityErrorFunc:
 * @ctx:  an xmlValidCtxtPtr validity error context
 * @msg:  the string to format *printf like vararg
 * @...:  remaining arguments to the format
 *
 * Callback called when a validity error is found. This is a message
 * oriented function similar to an *printf function.
 */
typedef void (*xmlValidityErrorFunc) (void *ctx,
                             const char *msg,
                             ...);


/*
 * xmlValidCtxt:
 * An xmlValidCtxt is used for error reporting when validating.
 */
typedef struct _xmlValidCtxt xmlValidCtxt;
typedef xmlValidCtxt *xmlValidCtxtPtr;
struct _xmlValidCtxt {
    void *userData;                     /* user specific data block */
    xmlValidityErrorFunc error;         /* the callback in case of errors */
    xmlValidityWarningFunc warning;     /* the callback in case of warning */

    /* Node analysis stack used when validating within entities */
    xmlNodePtr         node;          /* Current parsed Node */
    int                nodeNr;        /* Depth of the parsing stack */
    int                nodeMax;       /* Max depth of the parsing stack */
    xmlNodePtr        *nodeTab;       /* array of nodes */

    int              finishDtd;       /* finished validating the Dtd ? */
    xmlDocPtr              doc;       /* the document */
    int                  valid;       /* temporary validity check result */

    /* state state used for non-determinist content validation */
    xmlValidState     *vstate;        /* current state */
    int                vstateNr;      /* Depth of the validation stack */
    int                vstateMax;     /* Max depth of the validation stack */
    xmlValidState     *vstateTab;     /* array of validation states */

#ifdef LIBXML_REGEXP_ENABLED
    xmlAutomataPtr            am;     /* the automata */
    xmlAutomataStatePtr    state;     /* used to build the automata */
#else
    void                     *am;
    void                  *state;
#endif
};


The user creates context, fills userData with he's own data, say,
GtkTextBuffer, where errors should be displayed. He defines error and
warning function and start validation. When error occurred, the
corresponding error function is called with first argument "context". From
the context in error_function user finds where error occurred and he can
output that using userData field of context. That is how validation
looks in theory and it's almost all that user need.


In practice all not so fine. First, the error and warning function is
called with the first argument ctxt->userData instead of ctxt, so
context is unavailable in error_func. Why? The answer is easy - the
context is almost unusable. Unless it's fields is called "node",
"finishDtd", "valid" they actually means nothing. There is no way to
know, where error occurred, there is no such information in the context
structure. The user asks - if the context is unusable, why the expose it
to me in .h declaration. Why do I really need it?


Also, the next question occurred - where is error location reported. How
I can find it? User search mailing list and asks on irc, and gets an
answer - he should use global error handling function, that he can set
with xmlSetStructuredErrorFunc(). The disadvantages of such way are
obvious. There is no need to explain, why global functions are so bad.
But there is more problems with it.

Since that function is used not only in validation context, but in
parsing context too, the data passed to function can be changed
arbitrary without user notification. For example, you set function
my_structured_error, that start to validate document with DTD. When
libxml parse DTD on validation, the data for your function will be
replaced and can cause segmentation fault. For more details look
at example to bug #144823 in bugzilla, where such error is illustrated. So, in 
on_structure_error function user should check if the the data provided is
actually data he set to that function, an he should check that error->context
is actually the xmlValidCtxtPtr (Notice that there is no way do to such check,
since there is no way to check is error->context is xmlValidCtxtPtr, 
or xmlParseCtxtPtr). As result, user should create dozenz of dirty hacks to
make validation work.

After a quick look at valid.c in libxml sources, I think that 
ctxt->node field is not used at all. All operations are provided on structure 
ValidationState. Probably, as a quick fix, it can be set to an invalid 
node in functions xmlErrorValidNode and xmlErrorValidNodeNr? Than user can
save context in one of fields of userData structure and use it to get 
information about invalid node. 

But if it is possible to cleanup all this even with ABI breakage, it would be
much better. 

                                                Shmyrev.
                                                



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]