[xml] REPOST: xmlTextReader And RelaxNG (was Accurate Line Numbers?)



Hi,

Since my last post ( see below ), I have discovered that I don't get
accurate line numbers from xmlTextReaderGetLocator because my entire
document is getting read ahead so that RelaxNG validation can occur. If
I turn off RelaxNG validation then the line numbers come out correctly. 
So, I am faced with either figuring out how to modify my RelaxNG grammar
to not require read ahead or using my own validation mechanism.  I'm
leaning towards the latter because there was no response on my original
post but I thought I'd give it one more try.

What I need is for someone to help me understand the following statement
from the xmlTextReader documentation:

   While the RelaxNG validator can't always work in a streamable mode,
   only subsets which cannot be reduced to regular expressions need to 
   have their subtree expanded for validation. In practice it means
   that, unless the schemas for the top level element content is not
   expressible as a regexp, only chunk of the document needs to be
   parsed while validating.

What I need help with is understanding how to alter the RelaxNG grammar
so that all subsets, including the top level element, can be reduced to
regular expressions. It is my understanding that *any* RelaxNG grammar
can be reduced to a regular expression. 

If there is anyone out there that is familiar with using RelaxNG and
xmlTextReader, I would _greatly_ appreciate some clarification here. Any
learning that I uncover I will contribute back to the documentation.

Thanks,

Reid Spencer
CTO, eXtensible Systems, Inc.

-----Forwarded Message-----
From: Reid Spencer <reid x10sys com>
To: xml gnome org
Subject: Accurate Line Numbers?
Date: Mon, 16 Feb 2004 21:52:52 -0800

Hi.

I've looked at the documentation for xmlTextReader, searched the mail
archives, and reviewed the code to find out how to obtain the correct
line number of the source document during a parse with xmlTextReader.  
I'm using code like this:

    xmlTextReaderLocatorPtr locator = xmlTextReaderGetLocator( reader );
    int line = xmlTextReaderLocatorLineNumber( locator );
    xmlChar* uri = xmlTextReaderLocatorBaseURI( locator );

to get the URI and line number information during a parse.  The URI is
fine, but the line number is *always* the last line of the file.  

What's the *right* way to get line number information?

PLEASE NOTE: I'm using RelaxNG with xmlTextReader. I've noted the caveat
about this:

    While the Relax NG validator can't always work in a streamable mode,
    only subsets which cannot be reduced to regular expressions need to 
    have their subtree expanded for validation. In practice it means 
    that, unless the schemas for the top level element content is not
    expressible as a regexp, only chunk of the document needs to be 
    parsed while validating.

However, this statement doesn't make sense to me, especially the phrase
"only chunk of the document needs to be parsed while validating".

Does this mean that if RelaxNG is used, the xmlTextReader will read
ahead to get enough context to validate the document? And, in my case
its reading the whole document so I always get the last line of the
file from xmlTextReaderLocatorLinNumber?

Is there a way around this?

Answers like "don't use RelaxNG" won't be helpful. My document type
isn't easily be specified with DTD (it's a programming language),
libxml's schema support isn't able to handle my document type yet, and I
prefer RelaxNG anyway.

If the problem is RelaxNG use, I need a little more detail on how to
design the schema so the document can be parsed in chunks. That is, what
does the phrase "unless the schemas for the top level element content is
not expressible as a regexp" mean exactly? What RelaxNG constructs
wouldn't be allowed? The entire schema can be reduced to a regular
expression by definition (albeit a rather complicated one). For what
definition of "regular expression" does the statement above hold true?

Sorry to trouble you if this is an old question, I just couldn't find an
answer looking on my own. If I find a solution, I'll update the
xmlTextReader documentation with the information necessary.


Configuration: configure --with-fexceptions
Compiler: GCC 3.3.3
Platform: Linux 2.4.20-28
libxml2: CVS as of 5am 2/16/2004 GMT


Thanks in advance,

Reid Spencer (reid x10sys com)
eXtensible Systems, Inc.

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]