Re: Request: Test suite for EFS.
- From: bob thestuff net
- To: Miguel de Icaza <miguel gnu org>
- cc: mjs eazel com, michael nuclecu unam mx, Daniel Veillard w3 org, gnome-components-list gnome org, gnome-list gnome org, recipient list not shown: ;
- Subject: Re: Request: Test suite for EFS.
- Date: Wed, 16 Feb 2000 14:32:27 -0600 (CST)
One way around searching a huge xml file I think would be to do something
like this:
<data size="32434">
...some junk...
</data>
Then for quick searching, the search app, when it finds a data tag, will
read the size, and skip ahead that far. Should speed that up alot I think.
Also, have the searchable tags, if posible, at the top of the document
like html.
On Wed, 16 Feb 2000, Miguel de Icaza wrote:
>
> [Michael: comments on OLE2 at the end]
>
> > Disabling searching to avoid parsing the whole file sounds lame to
> > me. XML is definitely structured. Storing images in it should not be a
> > problem, see the RFC 2397 for one way to do it. Storing embedded
> > objects should be no problem either as long as they serialize to
> > XML. XML is perfectly happy to let you use multiple DTDs in one file.
>
> People from the Windows world are used to multi-megabyte files. Some
> of the Gnumeric test cases for Excel loading are pretty large.
>
> If we use XML exclusively, I wonder who is the brave soul who will be
> scanning a directory for information with an XML file. Consider a few
> hundred files on a server, and you are looking for documents that have
> been edited by "Maciej" at some point in life.
>
> I can picture the disk IO action going up, the memory usage going up
> and the time going up.
>
> Can you picture a way in which this could be solved with XML?
>
> > To do good searching you really need a background indexer in any case,
> > and that gives equally good performance either way, and people are
> > working on various parts of this problem already.
>
> That is one option, and might work if you set up things properly. But
> lets think as a regular, plain user. A small office of people who do
> not even have a sysadmin.
>
> They choose to put their docs on "/company docs/", and they accumulate
> a few hundred of those. Who will setup the background indexing for
> them? What if they add another directory? Is the setting global?
> toi the network? Per application? is it even standard across
> applications?
>
> The entire scenario described above is avoided completely in current
> Microsoft environments, because they can just scan the documents for
> the "summary/author" inside each file. Does not take a lot of memory,
> and does considerably less IO.
>
> > It sounds a lot to me like this efs thing is like a tarball, but
> > optimized for read-write access. If there were a widely available
> > command-line tool to process it, it might not be so bad.
>
> Yes, it is. We can write the command line options, and even a vfs
> module (so people can browse the internals with Nautilus or any other
> gnome-vfs applications).
>
> > But it would still be extra work to process it with standard XML
> > tools, so there would have to be an actual compelling reason for
> > preferring an ad-hoc structured binary format to an existing
> > structured format that can be processed with many general-purpose
> > tools.
>
> Yes, this is my concern as well. I wanted to use Microsoft's
> Structured Storage file format, until Michael told me about the
> shortcomings they had (small file names), although even this could
> probably be worked around.
>
> OLE2SS format is pretty standard in today's universe. Might make
> sense to just use OLE and deal with working around its brokeness.
> This way, even Microsoft tools could search and index our documents,
> and our applications would be ready to scan and search theirs.
>
> > I don't think fast random access to specific fields is a compelling
> > enough reason. Everyone else is moving away from binary files and
> > towards XML for serialization despite this issue.
>
> Not Microsoft. They do support exporting to XML, but their default
> file formats is still binary.
>
> Miguel.
>
>
> --
> FAQ: Frequently-Asked Questions at http://www.gnome.org/gnomefaq
> To unsubscribe: mail gnome-list-request@gnome.org with
> "unsubscribe" as the Subject.
>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]