Re: Request: Test suite for EFS.



On Wed, Feb 16, 2000 at 06:01:06AM -0600, Miguel de Icaza wrote:
> 
> [Michael: comments on OLE2 at the end]
> 
> > Disabling searching to avoid parsing the whole file sounds lame to
> > me. XML is definitely structured. Storing images in it should not be a
> > problem, see the RFC 2397 for one way to do it. Storing embedded
> > objects should be no problem either as long as they serialize to
> > XML. XML is perfectly happy to let you use multiple DTDs in one file.
> 
> People from the Windows world are used to multi-megabyte files.  Some
> of the Gnumeric test cases for Excel loading are pretty large.
> 
> If we use XML exclusively, I wonder who is the brave soul who will be
> scanning a directory for information with an XML file.  Consider a few
> hundred files on a server, and you are looking for documents that have
> been edited by "Maciej" at some point in life.  
> 
> I can picture the disk IO action going up, the memory usage going up
> and the time going up.
> 
> Can you picture a way in which this could be solved with XML?

In your structured document DTD specify that the <summary> must be the first
element in a <structureddocument> so you have:
<structureddocument>
  <summary>
	  <authors>
		  <person>Tobermory J. Womble</person>
		</authors>
		<editors>
		  <person>John Q. Public</person>
			<person>Maciej X. Ample</person>
		</editors>
	</summary>
	...

You use an event driven XML api like SAX (which I think gnome-xml does) and
feed data into the parser 1k at a time from the start of the file till you
get </summary>. With XML you _don't_ need the whole document in memory.
You know that if there is a summary then it will immediately follow the
root <structureddocument> element - if it doesn't then you can ignore the
document. I can't see this being an more memory or disk intensive than an
EFS or OLE2 file - probably slightly more CPU intensive, but not by much -
and you avoid all the platform independance issues. We could even write a
minimal XML parser to do this. I wrote an XML parser the other day because
gnome-xml is too complex for me, and it was about 180 lines of C (using 
glib) - a simple SAX parser would be much less.

ALSO: as we only need to read, mmap()ing rather than read()ing is feasible.

Ian

-- 
Ian  McKellar | Email: yakk(a)yakk.net     | Web: http://www.yakk.net/
Fax: +61 (8) 9265 0821 / +0 (775) 205 0307 | Home: +61 (8) 9389 9152
If God didn't want us to eat animals, he wouldn't have made them out of meat.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]