Re: [xml] A few question about parsing content



On Mon, Mar 31, 2008 at 09:45:49AM +0100, Julien Chaffraix wrote:
Hi everyone,

I have an application that has to parse a "content" ( ::= (element |
CharData | Reference | CDSect | PI | Comment)* as specified in the
libxml documentation).
Currently we are using xmlParseBalancedChunkMemory to parse it but it
has induced code duplication (mainly due to the fact that we cannot
tune the behavior
with a xmlParserCtxt).
I am trying to find a replacement for that API that should match the
behaviour of xmlParseBalancedChunkMemory (we do not provide xmlDocPtr
and xmlNodePtr as we build the representation ourselves using SAX2
callbacks).
Looking at the documentation, I found 3 candidates:
- xmlParseBalancedChunkMemoryRecover
- xmlParseInNodeContext
- xmlParseContext

First, have I found all the candidates? (I am quite new to libxml so
it is likely that I have missed some)

  That looks right to me.

Then, is there a way to choose between them so that I have a behavior
as close to xmlParseBalancedChunkMemory's as possible by providing a
well-crafted xmlParserCtxt (a pointer about which type to use / how to
initialize it would also be appreciated)?

  The problem is that what you are trying to do is not specified in the
spec as a normal parsing for XML, all the spec defines is how to parse
a document, not a subset. Since basically the spec is there for interopera-
bility there is a good reason to try to force this, I consider this is normal
except maybe for applications like editors. The fact that you use SAX
make you request look a bit suspicious actually, your application seems
to try to do something which is not interoperable, and not surprizing
it's harder to do with existing APIs...
  The only other thing I could think of, would be for you to set up 
a complete parser context and call xmlParseContent(), then do the parser
clanup in the end. It's really low level, requires more knowledge of the
parser internals, but I guess it's the price to pay for an a priori 
non-conformant behaviour.
  There are many things which are contextual when parsing an XML fragment
and you will have to recreate that context or you won't parse things
properly (e.g. namespace).

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]