Re: [GNOME VFS] Re: Daemons [Was: gob inside gnome-vfs ...]



On Fri, 2002-06-28 at 21:56, Maciej Stachowiak wrote:
> On 28Jun2002 04:51AM (-0700), Seth Nickell wrote:
> > On Thu, 2002-06-27 at 20:24, Maciej Stachowiak wrote:
> > > 
> > > It's actually stored as a per-directory thing now, although you could
> > > change that choice.
> > 
> > Ian and I are planning to change that for the GnomeVFS metadata
> > implementation, for a variety of reasons.
> 
> Note that the reason it was done that way originally was performance -
> avoiding the need to read the metadata file for every file in a
> directory when viewing it in Nautilus. Reading twice as many files is
> likely to have a negative impact on directory opening performance, so
> I advise you to measure the performance impact before making this
> choice.

Yes, I'm planning to measure this. The API will expose the idea of
opening a handle for the directory to get metadata anyway, simply
because some existing systems store metadata per directory anyway. This
will mean that it will be easy to transparently change how the metadata
is stored to be per-directory again if it proves to be a performance
problem.

> > > 
> > > The real actual non-theoretical bug we ran into in Nautilus happened
> > > when the main Nautilus binary and one of the sidebar panels (notes)
> > > both tried to change the metadata file at similar times. The delayed
> > > writeback of metadata meant that one set of changes often got lost,
> > > leading to loss of real user data (the note about the file).
> > > 
> > > This was what convinced us that we needed a metadata server. What
> > > would your solution to this problem be?
> > 
> > Honestly I don't see how this is substantially different from the
> > theoretical case where, say AbiWord, had an out-of-process component
> > that made small modifications to the file you were currently working on.
> 
> But AbiWord shouldn't and in fact doesn't do that. Multiple processes
> changing file *data* is not a real need, but multiple processes
> changing different pieces of *meta*data is.
> 
> Let me put it another way that will perhaps make it more
> clear. Suppose you change a file's permissions in one program, and
> it's modification date in another program. Would it be acceptable for
> one of these changes to randmly get lost arbitrarily 5 seconds later?
> To me the answer is clearly "no".

If you change the second paragraph of an AbiWord document and then run
an XML tool on the document that indents the file would it be acceptable
for one of these changes to be lost 5 seconds later? To me, the answer
is unclear, but probably "no".

I'm claiming that this problem of having multiple processes accessing
the file at once is primarily a result of an engineering choice made
with Nautilus, namely to structure Nautilus out of a multitude of
in-process components that do not synchronize with each other for access
to resources. We recognized this problem in Nautilus, and because
metadata was implemented entirely within Nautilus at the time, we solved
it by forcing Nautilus processes to syrchonize between each other. This
is still possible (as I stated) if GnomeVFS does not perform the
synchronization inside a user-daemon. Nautilus can still have a CORBA
server that it uses to synchronize metadata access by its components. No
big deal.

I still fail to see how this is a general problem that should affect the
design of GnomeVFS metadata. Why is it reasonable to have multiple
processes accessing metadata at the same time and not have multiple
processes accessing normal data? I still claim that this is just a
result of a particular design choice made within Nautilus (that can
easily be solved *inside nautilus*, otherwise we would of course make
accomodations for it, as the file manager is a very important API user).
If this is a quirk of how nautilus is structured (and is solvable easily
from within Nautilus, which it is), and there's no inherent reason why
metadata is likely to get changed from seperate programs (not, note,
processes) where normal data is not, it seems silly to solve the problem
for metadata if we are not also solving it for normal file access.

Want a concrete example using normal data that I think is 100% parallel
to Nautilus' metadata situation? Lets talk about an IDE such as Anjuta
that is built using multiple components much like Nautilus uses multiple
components to handle views and sidebars. Now lets say that this IDE
embedded the components out of process for the same reasons Nautilus
does (largely that if the component crashes, Nautilus does not, I
believe, though async may have originally been part of the reason too?).
So we have two components, one is a class browser and the other is a
text editor. The user types some text into the code editor, creates a
new function, whatever. Now the class browser also has a feature that
allows you to renname the entire class (and it automatically changes all
the references within that file to the new name). The user rennames the
class. 

Uhoh. These were seperate processes that both have a seperate set of
possibly incompatible changes. If the IDE did not implement some sort of
locking or cross-component negotiation itself data is going to be
written over. Normally I think we would say that the filesystem (or
virtual filesystem) is not responsible for this problem, the locking or
necessary communication needs to happen within the IDE itself. Maybe the
IDE needs a mechanism for informing the class browser that there are
unsaved changes and not to allow modifications. Even better would be to
get up to date information about what's in the editor independent of
what's actually written to disk (and this is of course how IDEs are
expected to work). If we expect the library user to perform the
necessary negotiation between its processes accessing the same
filesystem resource in this case, I think its reasonable to require
Nautilus to perform that negotiotiation/locking.

I think we may be miscommunicating here. I definitely prefer to have
locking of metadata done in GnomeVFS. It seems safer and saner. What I
don't understand is how you can want locking of metadata and not find it
reasonable for files?

> > This general problem is not of pressing importance because, in general,
> > multiple programs tend not to end up accessing the same file at the same
> > time. If they do, we have traditionally foisted responsibility for the
> > ensuing problems on the user (i.e. if they open a file in two programs
> > at the same time and overwrite changes from one with the other, well we
> > let that happen, though we're supposed to warn the file has changed on
> > disk). I'm not really defending this view, because I think its broken
> > personally, but I don't see how metadata is substantially different.
> 
> It's not really the same. Saving a file twice in two programs will
> always have the effect of both of the operations fully completed in
> some order - you'll get one copy of the file or the other.
> 
> But suppose you have this scenario (totally made up in terms of what
> the programs actually do, solely to illustrate a hypothetical):
> 
> - User drags a file around inside a Nautilus folder.
> - Nautilus sets file position metadata.
> - Right after the user drags the file to Evolution to add it as an
> attachment to a mail message.
> - Evolution sets the "Message Attachment" metadata key on file A so
> you could later search for files you've sent as attachments.
> 
> It's very likely that one or the other of these changes will be lost,
> particularly if you have delayed writeback of metadata.

Why do you have delayed writeback of metadata ??? I'm trying to imagine
the user performing these operations quickly enough that each program
hasn't already written the data to disk.

> In conclusion, metadata is substantially different because the
> expected atomic operation is changing one particular metadata key,
> whereas for a file the typical mode of operation is to replace the
> whole file. If you always replace the whole file you need no extra
> locking; but for metadata you do, because in general it requires a
> read-modify-write cycle to change a single key (which, again, is the
> expected level of atomicity).

And that is what I think is bullshit. I think the idea of writing the
whole file being the atomic operation is something we accept because its
traditionally a place where we've foisted responsibility to the user,
not because its actually a good idea.

> > I would suggest Nautilus use its current implementation, namely a CORBA
> > metadata server, but that it use GnomeVFS APIs to actually write the
> > metadata out to disk. That will give the desired locking between
> > Nautilus processes.
>  
> It seems silly to me to design a gnome-vfs API for metadata that's
> known not to be good enough for Nautilus and requires a fancy extra
> abstraction to make it work right, since Nautilus is going to be the
> primary client of the metadata API.

The whole reason for moving this API to GnomeVFS is that other clients
want access to metadata. I fail to see how Nautilus is the "primary
client" in any particular way. Any application that presents files to
the user is presumably going to access some metadata, if only for
displaying the icon (for example).

> Here is yet another way to think of metadata - it's more like
> preferences than like the filesystem. For all the reasons that GConf
> is a good approach to preferences, using a server is a good approach
> to metadata. And all the arguments against using a server for metadata
> apply equally as arguments for using gnome-config instead of GConf.

I understand that another major reason (perhaps the major reason?) for
using a user daemon for GConf was so that callbacks could be triggered
when data of interest was modified. In the case of metadata we are just
going to use GnomeVFS' standard monitoring for this.

> I'm curious what you think the big disadvantage of a metadata server
> is, as compared to the obvious advantages of avoiding loss of user
> data, providing a shared cache with writeback so the disk is hit less
> often, providing coherent change notification, etc.

If anything I'm arguing that we need to provide this service for regular
files. I can understand why performing this sort of locking is valuable.
What I can't understand is why you'd find synchronizing access to
metadata valuable and not normal data.

-Seth




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]