[xml] Accurate Line Numbers?



Hi.

I've looked at the documentation for xmlTextReader, searched the mail
archives, and reviewed the code to find out how to obtain the correct
line number of the source document during a parse with xmlTextReader.  
I'm using code like this:

    xmlTextReaderLocatorPtr locator = xmlTextReaderGetLocator( reader );
    int line = xmlTextReaderLocatorLineNumber( locator );
    xmlChar* uri = xmlTextReaderLocatorBaseURI( locator );

to get the URI and line number information during a parse.  The URI is
fine, but the line number is *always* the last line of the file.  

What's the *right* way to get line number information?

PLEASE NOTE: I'm using RelaxNG with xmlTextReader. I've noted the caveat
about this:

    While the Relax NG validator can't always work in a streamable mode,
    only subsets which cannot be reduced to regular expressions need to 
    have their subtree expanded for validation. In practice it means 
    that, unless the schemas for the top level element content is not
    expressible as a regexp, only chunk of the document needs to be 
    parsed while validating.

However, this statement doesn't make sense to me, especially the phrase
"only chunk of the document needs to be parsed while validating".

Does this mean that if RelaxNG is used, the xmlTextReader will read
ahead to get enough context to validate the document? And, in my case
its reading the whole document so I always get the last line of the
file from xmlTextReaderLocatorLinNumber?

Is there a way around this?

Answers like "don't use RelaxNG" won't be helpful. My document type
isn't easily be specified with DTD (it's a programming language),
libxml's schema support isn't able to handle my document type yet, and I
prefer RelaxNG anyway.

If the problem is RelaxNG use, I need a little more detail on how to
design the schema so the document can be parsed in chunks. That is, what
does the phrase "unless the schemas for the top level element content is
not expressible as a regexp" mean exactly? What RelaxNG constructs
wouldn't be allowed? The entire schema can be reduced to a regular
expression by definition (albeit a rather complicated one). For what
definition of "regular expression" does the statement above hold true?

Sorry to trouble you if this is an old question, I just couldn't find an
answer looking on my own. If I find a solution, I'll update the
xmlTextReader documentation with the information necessary.


Configuration: configure --with-fexceptions
Compiler: GCC 3.3.3
Platform: Linux 2.4.20-28
libxml2: CVS as of 5am 2/16/2004 GMT


Thanks in advance,

Reid Spencer (reid x10sys com)
eXtensible Systems, Inc.

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]