Re: performance, gscanner.c and XML reads



On Fri, Jan 28, 2005 at 09:12:42AM -0500, ANDREW MARLOW, BLOOMBERG/ LONDON OF wrote:
> I have a large XML file and it takes quite a long time
> for gscanner to read it. Using quantify shows that
> several calls to read are made. I wonder if things can
> be sped up by allowing the caller to specify the
> buffer size used for read? Currently this is set
> by a macro in gscanner.c to 4000 bytes.
> I would like to use a larger value when I am
> using a larger XML file. Any thoughts?

I've never used the gscanner before.  (all this is pretty new to me.)  I
was reading about it last night though.  (There's a lot of really cool
stuff in Glib, isn't there?  GTK+ too.  I'm amazed.)  so, take this with
a grain of salt:

I looked at the source just a little and I agree with you that it seems
like one possible bottle neck is the file handling stuff...  though it
also looks like it's been written more or less as efficiently as
possible.  Increasing the buffer size could help, but I'm thinking a
better solution would be to stop using gscanner's reading code all
together.  That would make the read buffer size a non-issue.  Instead,
use mmap() to suck the entire file into memory before you start feeding
it to the scanner.

For those who've never used mmap() before: make sure you call munmap()
once for each time you call mmap() on a given file.  I ran into this bug
once where a file was mmap()'d and kept in memory by one program and
periodically updated by another program.  Whenever the mmap()ing program
determined that the file had been updated, it reopened the file and
mmap()ed it again to the same memory address.  It looked like valid
code, but caused what looked like a file descriptor leak.  This isn't
mentioned in the docs last I checked.

- Ben




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]