Re: [GNOME VFS] Re: Daemons [Was: gob inside gnome-vfs ...]



On 28Jun2002 04:51AM (-0700), Seth Nickell wrote:
> On Thu, 2002-06-27 at 20:24, Maciej Stachowiak wrote:
> > 
> > It's actually stored as a per-directory thing now, although you could
> > change that choice.
> 
> Ian and I are planning to change that for the GnomeVFS metadata
> implementation, for a variety of reasons.

Note that the reason it was done that way originally was performance -
avoiding the need to read the metadata file for every file in a
directory when viewing it in Nautilus. Reading twice as many files is
likely to have a negative impact on directory opening performance, so
I advise you to measure the performance impact before making this
choice.
 
> > 
> > The real actual non-theoretical bug we ran into in Nautilus happened
> > when the main Nautilus binary and one of the sidebar panels (notes)
> > both tried to change the metadata file at similar times. The delayed
> > writeback of metadata meant that one set of changes often got lost,
> > leading to loss of real user data (the note about the file).
> > 
> > This was what convinced us that we needed a metadata server. What
> > would your solution to this problem be?
> 
> Honestly I don't see how this is substantially different from the
> theoretical case where, say AbiWord, had an out-of-process component
> that made small modifications to the file you were currently working on.

But AbiWord shouldn't and in fact doesn't do that. Multiple processes
changing file *data* is not a real need, but multiple processes
changing different pieces of *meta*data is.

Let me put it another way that will perhaps make it more
clear. Suppose you change a file's permissions in one program, and
it's modification date in another program. Would it be acceptable for
one of these changes to randmly get lost arbitrarily 5 seconds later?
To me the answer is clearly "no".


> This general problem is not of pressing importance because, in general,
> multiple programs tend not to end up accessing the same file at the same
> time. If they do, we have traditionally foisted responsibility for the
> ensuing problems on the user (i.e. if they open a file in two programs
> at the same time and overwrite changes from one with the other, well we
> let that happen, though we're supposed to warn the file has changed on
> disk). I'm not really defending this view, because I think its broken
> personally, but I don't see how metadata is substantially different.

It's not really the same. Saving a file twice in two programs will
always have the effect of both of the operations fully completed in
some order - you'll get one copy of the file or the other.

But suppose you have this scenario (totally made up in terms of what
the programs actually do, solely to illustrate a hypothetical):

- User drags a file around inside a Nautilus folder.
- Nautilus sets file position metadata.
- Right after the user drags the file to Evolution to add it as an
attachment to a mail message.
- Evolution sets the "Message Attachment" metadata key on file A so
you could later search for files you've sent as attachments.

It's very likely that one or the other of these changes will be lost,
particularly if you have delayed writeback of metadata.

However, this is not equivalent to the completion of these two
operations in any order.

In conclusion, metadata is substantially different because the
expected atomic operation is changing one particular metadata key,
whereas for a file the typical mode of operation is to replace the
whole file. If you always replace the whole file you need no extra
locking; but for metadata you do, because in general it requires a
read-modify-write cycle to change a single key (which, again, is the
expected level of atomicity).

> I would suggest Nautilus use its current implementation, namely a CORBA
> metadata server, but that it use GnomeVFS APIs to actually write the
> metadata out to disk. That will give the desired locking between
> Nautilus processes.
 
It seems silly to me to design a gnome-vfs API for metadata that's
known not to be good enough for Nautilus and requires a fancy extra
abstraction to make it work right, since Nautilus is going to be the
primary client of the metadata API.

> > 
> > However, when you save file.txt you typically expect to get those
> > exact contents in the file, you don't expect changes to be merged. But
> > it is reasonably to expect different metadata keys for the same file
> > to be changeable independently without losing either change.
> > 
> > The metadata for a file is more like a directory than like a file. If
> > two different programs add an entry to a directory at the same time,
> > it would clearly be wrong if only one of the files actually appeared
> > in the directory, right?
> > 
> > I see setting metadata keys the same way.
> 
> This is part of the reason I've briefly considered implementing metadata
> as a hidden directory per file, with a file per key inside that
> subdirectory. Its sort of nifty, except in the practical world in
> consumes a lot of space on standard filesystems (which, other than
> ReiserFS, don't store small file efficiently). The finer the granularity
> we drop to wrt to # of keys or objects of metadata per actual file
> stored, obviously we reduce risk of having contention for that resource.

That's an interesting implementation idea, but it's totally separate
from the desired semantics. Why should it even be visible to
applications whether the metadata is stored as a file per file, a file
per directory, a big global file, or a file per key? Regardless of how
it's stored physically, it should provide virtual directory semantics.

Otherwise, as experience has shown, user data will be lost.


Here is yet another way to think of metadata - it's more like
preferences than like the filesystem. For all the reasons that GConf
is a good approach to preferences, using a server is a good approach
to metadata. And all the arguments against using a server for metadata
apply equally as arguments for using gnome-config instead of GConf.


I'm curious what you think the big disadvantage of a metadata server
is, as compared to the obvious advantages of avoiding loss of user
data, providing a shared cache with writeback so the disk is hit less
often, providing coherent change notification, etc.


Regards,

Maciej




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]