[xml] avoiding unnecessary copying with xmlreader



Hi all,

I have noticed that all the functions in the xmlreader interface make a
copy of strings before they are returned to the user.  I realise this
has many benefits for the usual usage of the API, however I would like
to find a way to avoid the copying.

I am building my own C++ based tree structure from the document, and at
any point during parsing I will take character data (e.g. element name,
text content) from the current node and copy it straight into a new
string object which is added to the tree.  These new strings are not
able to adopt the bytes of the string, I have to copy.  So in effect I
am getting every string copied twice, once by libxml and once by me.
The large amounts of extra mallocs/frees that this involves seems to
have a significant effect on the performance of my app.

I would like to be able to avoid the copy and get at the string pointer
directly.  I realise this pointer would have a very limited lifetime
before it is invalidated by the next call to libxml, but that is not a
problem for me as I will copy it immediately.  Also there may be some
cases where a copy is unavoidable and that is ok. 

Does anyone know of a way to achieve this cleanly with the current 
release?  If not, might I suggest that some kind of api be provided for
this.  I would envisage a set of 'internal' functions which actually do
the work of the current functions, and then wrapper functions which
simply xmlStrdup the result of the internal function and return it to
the user.  My code could then call the internal functions directly, but
the main api would remain the same.  I suppose another option would be
to make the xmlStrdup function called in these functions pluggable in a
similar way to the alloc/free functions.  I could then simply replace
the functions with noops.

Any thoughts appreciated,

Graham.

-- 
Graham Bennett



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]