Re: Request: Test suite for EFS.



Miguel de Icaza <miguel@gnu.org> writes:

> >   Is that the end of the XML format ? 
> > How are you gonna reuse your filesystem embedded in a file 10 years from
> > now when you will look at your precious data on a CDROM but without a running
> > copy of Gnumeric ?
> > 
> > Daniel, concerned ...
> 
> It is definetly an option.
> 
> First of all, some features are very hard to implement with ad-hoc XML
> features.  For example, it is not easy to write an application that
> scans documents and searches for "Daniel Veillard" as their author.
> 
> With the way Structured Storage works in OLE2, you know that the
> author is always going to be in "summary/author" inside the structured
> storage file: there is no need to load the entire file, you just open
> the directory, and do the lookups for summary/author inside the file
> system, no need to parse the entire thing into memory and find out
> whether the author is Daniel or not.
> 
> Inside this structured storage file, the current XML file will be
> contained as well as any other resources like images, links and
> embedded objects.
> 
> Unless someone convinces me that this is not a good idea, Structured
> Storage makes a lot of sense.
> 
> And of course, we could support a "pure" XML mode for those that do
> not want to use the search facilities.

Disabling searching to avoid parsing the whole file sounds lame to
me. XML is definitely structured. Storing images in it should not be a
problem, see the RFC 2397 for one way to do it. Storing embedded
objects should be no problem either as long as they serialize to
XML. XML is perfectly happy to let you use multiple DTDs in one file.

To do good searching you really need a background indexer in any case,
and that gives equally good performance either way, and people are
working on various parts of this problem already.

It sounds a lot to me like this efs thing is like a tarball, but
optimized for read-write access. If there were a widely available
command-line tool to process it, it might not be so bad. 

But it would still be extra work to process it with standard XML
tools, so there would have to be an actual compelling reason for
preferring an ad-hoc structured binary format to an existing
structured format that can be processed with many general-purpose
tools.

I don't think fast random access to specific fields is a compelling
enough reason. Everyone else is moving away from binary files and
towards XML for serialization despite this issue.

My 2c,

Maciej



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]