Re: [xml] libxml2 equivalents for expat's XML_GetCurrentByteIndex and XML_GetCurrentByteCount



On 19 Oct 2012, at 5:20 AM, Daniel Veillard <veillard redhat com> wrote:

The docs say "This function provides the current index of the parser relative to the start of the current 
entity.", when it says "current index of the parser" what exactly does this point to? The start of the 
element? The character following the end of the element? Something else?

 That depends when you ask !

I would be asking in the middle of a SAX event. In other words, I would be saying "this event you just called 
for for, where does the data start in the original raw stream, and where does it end".

 You seems you have a very perverse definition of what an element is:

 <body  id="foo"> .... </body>

By definition an element end with the ETag if not empty, the end tag:
 http://www.w3.org/TR/REC-xml/#NT-element
What you are referencing is actually the start and the end of the
start tag STag
 http://www.w3.org/TR/REC-xml/#NT-STag

Please avoid inventing terms. The spec is out there it defines the
terminology precisely.

I am following the terminology used by the API, which calls it an element: 
http://www.xmlsoft.org/html/libxml-parser.html#startElementSAXFunc, sorry for the confusion.

Assuming you call the function in a start element SAX callback you will
get xmlByteConsumed pointing just after the '>' at the end of the start
tag. You should be able to find the corresponding '<' in
ctxt->input->base when progressing backward from ctxt->input->cur
which is the current index of the parser. then you can get the lenght
of the start tag in uTF-8 encoding, and from there find the lenght
of the start tag in the original document encoding, and then you can
substract it from xmlByteConsumed() to get the second value you want.

Am I right in understanding that ctxt->input->base points at the buffer having been previously passed to 
htmlParseChunk(), or can I expect this buffer to have been generated internally to libxml somewhere using 
malloc or something else?

Regards,
Graham
--




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]