Re: [xml] validating SAX parser and error line numbers



Thus spake Joel Uckelman:
I'm having two problems with line numbers and error reporting:

1) I'm using xmlSchemaSAXPlug() to validate against a schema when I
parse. When my xmlStructuredError callback is called with a validation
error, the the line and int2 (which is supposed to be a column?) are
always zero, regardless of where the error is in the input. E.g., If I
have an element "<blerg/>" on line 12 which isn't permitted by the
schema, and I print the error mesage like so:

  std::cerr << err->line << ',' << err->int2 << ": " << err->message << '\n';

I get this as output:

  0,0: Element 'blerg': This element is not expected.

So, clearly the validator knows where the error is, but that's not
getting into the xmlErrorPtr my error callback receives.

I believe I've found a solution to problem #1. Run this before parsing
the document:

  xmlSchemaValidateSetLocator(valid_ctx, locator, ctx);

valid_ctx is the xmlSchemaValidCtxt, and ctx is the xmlParserCtxt.
locator is this function, similar to one I spotted in the libxml2
source, which sets the line number for the validator:

  int locator(void *ctx, const char** file, unsigned long *line) {
    xmlParserCtxtPtr pctx = static_cast<xmlParserCtxtPtr>(ctx);
    if (pctx) {
      if (file) {
        *file = nullptr;
      }

      if (line) {
        *line = 0;
      }

      if (pctx->input) {
        if (file) {
          *file = pctx->input->filename;
        }

        if (line) {
          *line = pctx->input->line;
        }
    
        return 0;
      }
    }

    return -1;
  }

That appears to give me correct line numbers for validation errors.
 
2) If after a call to xmlParseDocument(), I find that !ctx->wellFormed,
is there a way to get an accurate indication of where things went wrong?
As with my example above, if I have "<blerg" on line 12 and call
xmlParseDocument(), after that the ctx->input->line is one more than the
last line of input---i.e., it appears that the parser read to the end in
a desparate attempt to find a closing '>'.

I'm still hunting for a good solution for #2. The best idea I've had
so far is for each SAX callback to update a variable containing the
line number at the time it's run. This way, that variable (call it
"goodline") will contain the number of the last line on which anything
was successfully processed. I would think that libxml must already have
this information, if only I knew where to look for it.
 
-- 
J.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]