Re: [xml] XPath queries on document fragments without a document



You bet!  I'm actually out to compete with AxKit here in a month
or two and think I can beat their performance #'s.  :~) I'm a
perl->ruby convert and want to offer something superior to Perl.
Other than an

  Hehe, I think Matt won't be annoyed by a bit of challenge :-)

Heh, hope not.  :)

I've wrapped most of the data structures and have just finished
wrapping the parser contexts.  I'm about to go through and add
parser contexts to everything so that you can make multiple calls
to xpath w/o segv'ing the parser.  Speaking of, what's the best
way to hook into libxml's memory management given Ruby has its own
garbage collector that runs at uncontrollable times.  I was having
some problems cleaning up the xml parser in a persistent
environment.  I settled in on setting up a counter for how many
XML documents were in existence, but that seems really broken.
Any tips/incite there?  -sc

  Well look at http://xmlsoft.org/xmlmem.html , basically you can
override completely the set of memory routines used by libxml2 and
libxslt using xmlMemSetup(), but I'm not sure it will be really
useful. IIRC Matt had to build reference counting layer for each
object of libxml2 manipulated from perl.

I'm using xmlMemSetup() and that made a big difference in terms of
preventing libxml from stepping on Ruby's toes.  I'm actually worried
that there's some place in libxml where it's still not using the
memory routines specified by xmlMemSetup(), but I couldn't find any
violators.

You know what Matt did in terms reference counting?  I don't think
I'll have to with Ruby's GC structure.  The extent of what I've done
is set a flag for each object that sends a hint to the GC whether or
not the node, doc, etc. should be unlinked and free()'ed, or whether
or not just the object itself should be free()'ed (but not the
contents of the xmlNodePtr).  This seems to be working well as a
strategy but I'm getting an interesting problem:

If I parse a document with xmlParseDocument() and the document is well
formed ('<a>b</a>'), when I assign the result out of the parser
context, it seems as though when the parser context is free()'ed that
it's also free()'ing ctxt->myDoc.  If I make a copy of myDoc I seem to
be good to go, but that's a rather large memory/performance hit for
large documents.  Am I correct in assuming that I should be able to
return xmlDocPtr = ctxt->myDoc, or should I be making a copy of the
doc?  I could set doc = ctxt->myDoc && ctxt->myDoc = NULL, but I don't
know what that buys me since I didn't see any place in
xmlFreeParserCtxt where it free'ed the document.  :-/ -sc


-- 
Sean Chittenden



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]