Re: memory allocations / libxml



On Fri, 2002-03-01 at 21:37, Havoc Pennington wrote:
> > 	xmlStrndup
> > 	xmlParseAttValue
> 
> UIHandler, no surprise there either.

	I don't understand what that's meant to mean. There are various fixes
possible here, clearly writing a custom on-stack XML parser for simple
fragments would be one (not so trivial) possibility, but the number of
discrete XML fragment parses should be extremely tiny, mostly people
should be using the 'set_prop' API which does no parsing in the common
case.

	Of course, it seems possible (to me), but is perhaps scads of coding,
to dup the buffer being parsed, and then scribble '\0's on it when we
hit significant lexical tokens [if we're doing a SAX parse], to avoid
doing:

700x
	tmp = xmlStrndup (part_of_buffer, 7);
	ctxt->uiCharacters (ctxt, tmp);
	xmlFree (tmp);

With 1 big dup instead [ or a chunked dup / but then the chunk reading
becomes perhaps far more painful, I havn't looked at the code ].

	Is that an easy thing to do Daniel ? clearly I'd rather use libxml than
writing my own XML parser :-) we already use the fast SAX interface.

	Also; I imagine that libXML could (probably) for short strings use
alloca for this sort of copy/call/free sequence - is that a feasible
suggestion ? that'd kill locking and malloc overhead and speed up the
parser nicely.

	The thing that worries me more about libxml looking at an strace of
it's operation in gconf, bonobo-activation, nautilus etc. is this:

[snip]
       I'm hoping this one is easy to fix; here is an strace -ttt trace
of me calling xmlParseFile on a really quite small file :-) as you would
expect I would not imagine that we need the umpteen redundant read
syscalls all returning 0 :-)

        Any chance of a fix ? it gets worse with bigger files I have:

[pid  4424] 1014818823.507812 read(10, "", 4096) = 0
...
[pid  4424] 1014818823.524004 read(10, "", 4096) = 0
[pid  4424] 1014818823.524106 close(10) = 0

        Only 20ms I know, but over 62 servers = ~1 second of redundant
sys
callage ;-)     I see the same phenomenon spiking both gconf,
bonobo-activation and nautilus performance to a degree.

        Is it possible that this happened in libxml1 - if so a backport
would
be really wonderful :-)

        Thanks,

                Michael.
[snip]

	Of course, the strace timings are somewhat unfair, but doing dozens of
bogus read's doesn't help file parsing performance.

	HTH,

		Michael.

	

-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]