Hi, Since my last post ( see below ), I have discovered that I don't get accurate line numbers from xmlTextReaderGetLocator because my entire document is getting read ahead so that RelaxNG validation can occur. If I turn off RelaxNG validation then the line numbers come out correctly. So, I am faced with either figuring out how to modify my RelaxNG grammar to not require read ahead or using my own validation mechanism. I'm leaning towards the latter because there was no response on my original post but I thought I'd give it one more try. What I need is for someone to help me understand the following statement from the xmlTextReader documentation: While the RelaxNG validator can't always work in a streamable mode, only subsets which cannot be reduced to regular expressions need to have their subtree expanded for validation. In practice it means that, unless the schemas for the top level element content is not expressible as a regexp, only chunk of the document needs to be parsed while validating. What I need help with is understanding how to alter the RelaxNG grammar so that all subsets, including the top level element, can be reduced to regular expressions. It is my understanding that *any* RelaxNG grammar can be reduced to a regular expression. If there is anyone out there that is familiar with using RelaxNG and xmlTextReader, I would _greatly_ appreciate some clarification here. Any learning that I uncover I will contribute back to the documentation. Thanks, Reid Spencer CTO, eXtensible Systems, Inc. -----Forwarded Message-----
From: Reid Spencer <reid x10sys com> To: xml gnome org Subject: Accurate Line Numbers? Date: Mon, 16 Feb 2004 21:52:52 -0800 Hi. I've looked at the documentation for xmlTextReader, searched the mail archives, and reviewed the code to find out how to obtain the correct line number of the source document during a parse with xmlTextReader. I'm using code like this: xmlTextReaderLocatorPtr locator = xmlTextReaderGetLocator( reader ); int line = xmlTextReaderLocatorLineNumber( locator ); xmlChar* uri = xmlTextReaderLocatorBaseURI( locator ); to get the URI and line number information during a parse. The URI is fine, but the line number is *always* the last line of the file. What's the *right* way to get line number information? PLEASE NOTE: I'm using RelaxNG with xmlTextReader. I've noted the caveat about this: While the Relax NG validator can't always work in a streamable mode, only subsets which cannot be reduced to regular expressions need to have their subtree expanded for validation. In practice it means that, unless the schemas for the top level element content is not expressible as a regexp, only chunk of the document needs to be parsed while validating. However, this statement doesn't make sense to me, especially the phrase "only chunk of the document needs to be parsed while validating". Does this mean that if RelaxNG is used, the xmlTextReader will read ahead to get enough context to validate the document? And, in my case its reading the whole document so I always get the last line of the file from xmlTextReaderLocatorLinNumber? Is there a way around this? Answers like "don't use RelaxNG" won't be helpful. My document type isn't easily be specified with DTD (it's a programming language), libxml's schema support isn't able to handle my document type yet, and I prefer RelaxNG anyway. If the problem is RelaxNG use, I need a little more detail on how to design the schema so the document can be parsed in chunks. That is, what does the phrase "unless the schemas for the top level element content is not expressible as a regexp" mean exactly? What RelaxNG constructs wouldn't be allowed? The entire schema can be reduced to a regular expression by definition (albeit a rather complicated one). For what definition of "regular expression" does the statement above hold true? Sorry to trouble you if this is an old question, I just couldn't find an answer looking on my own. If I find a solution, I'll update the xmlTextReader documentation with the information necessary. Configuration: configure --with-fexceptions Compiler: GCC 3.3.3 Platform: Linux 2.4.20-28 libxml2: CVS as of 5am 2/16/2004 GMT Thanks in advance, Reid Spencer (reid x10sys com) eXtensible Systems, Inc.
Attachment:
signature.asc
Description: This is a digitally signed message part