Re: Request: Test suite for EFS.



Miguel de Icaza <miguel@gnu.org> writes:

> [Michael: comments on OLE2 at the end]
> I can picture the disk IO action going up, the memory usage going up
> and the time going up.
> 
> Can you picture a way in which this could be solved with XML?

Indexing.:-)

> > To do good searching you really need a background indexer in any case,
> > and that gives equally good performance either way, and people are
> > working on various parts of this problem already.
> 
> That is one option, and might work if you set up things properly.  But
> lets think as a regular, plain user.  A small office of people who do
> not even have a sysadmin.
> 
> They choose to put their docs on "/company docs/", and they accumulate
> a few hundred of those.  Who will setup the background indexing for
> them?  What if they add another directory?  Is the setting global?
> toi the network?  Per application?  is it even standard across
> applications?

Indexing of everything would always happen all the time. There is no
setup (if the OS distributor gets it right).

I don't see why you keep harping on the special case of the author
field anyway. I usually want to search based on the actual data
contents of the file, and structured storage doesn;t help with
that. Good system-wide indexing is the only thing that will solve it
(us Eazel folks are planning to work on an indexing server and search
API, BTW).

> The entire scenario described above is avoided completely in current
> Microsoft environments, because they can just scan the documents for
> the "summary/author" inside each file.  Does not take a lot of memory,
> and does considerably less IO.

Sure, but how do they make searching on contents efficient? They have
the "Fast Find" feature which is essentially a background indexer
(except that it is really badly implemented so it slows your system to
a crawl for possibly minutes at a time).

You need indexing to do good, fast, general-purpose search. Using a
binary file format to special-case some fields is a large price to pay
for solving maybe 5-10% of the problem.

> > It sounds a lot to me like this efs thing is like a tarball, but
> > optimized for read-write access. If there were a widely available
> > command-line tool to process it, it might not be so bad. 
> 
> Yes, it is.  We can write the command line options, and even a vfs
> module (so people can browse the internals with Nautilus or any other
> gnome-vfs applications).
> 

That would be nice. I'm not sure the VFS module would be all _that_
useful, because I'm concerned a UI for opening a structured storage
file as a filesystem rather than with it's app would be confusing to
the user.

> OLE2SS format is pretty standard in today's universe.  Might make
> sense to just use OLE and deal with working around its brokeness.
> This way, even Microsoft tools could search and index our documents,
> and our applications would be ready to scan and search theirs.

I think the next generation of smart XML tools is much more important
to support than legacy Microsoft tools. Let's design data formats for
tomorrow's universe, not today.
 
> > I don't think fast random access to specific fields is a compelling
> > enough reason. Everyone else is moving away from binary files and
> > towards XML for serialization despite this issue.
> 
> Not Microsoft.  They do support exporting to XML, but their default
> file formats is still binary.

I heard even Microsoft had plans afoot to make XML the native format
someday. But even if they don't, they're about the only company left
that prefers legacy proprietary file formats to open standards.

 - Maciej




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]