Updated metadata proposal



Tom Tromey writes:
 > I've incorporated Miguel's suggestions into my metadata document, and
 > clarified a couple points.
 > 
 > 	http://www.cygnus.com/~tromey/gnoem/metadata.html
 > 
 > Please tell me what you think.  Implementation will begin shortly.

The API looks pretty good to me. Using string VFS filenames to access 
string data should cover anything and lends itself to a gdbm solution.

The idea of using garbage collection is attractive in terms of its
ease of use but I wonder whether this is a sensible approach to take.
Garbage collecting metadata on a multiuser machine with large file
systems and extensive VFS references could take a long time and is
particularly unattractive for files held at remote ftp sites.

We are already relying on applications to record file creation and 
movement. At the very least, the API should include a command to clear 
out all the metadata for a file when it is removed. It may be
appropriate to leave all handling of file removal up to applications.
If garbage collecting is used then a command to invoke it on a 
directory tree might be useful in the API so applications can help
suggest when it would be useful.

I am not sure about distributing the metadata db files over the user's 
filespace. It may speed up look ups of data (?) and reduce the key
sizes in the data base. However, it may also make the metadata
more vulnerable to destruction or dislocation and, as you note, cannot 
cover files in directories that are not exclusively writable by the user.

In theory letting the user control where the metadata is kept should
allow them to tune their set up to take account of these trade
offs. However, very few users will be able to make sensible decisions
on these issues (I don't think I would be one of them :). Changing 
from one arrangement to another might prove very tricky.

Therefore, I think there is a case for deciding where the metadata 
database(s) should go now and leaving them where ever we decide. Even
if we allow flexibility, we still need to be clear about why we have
adopted the default location since most users will stick with it. 

I think I incline towards sticking it all in one database in ~/.gnome.
This will make it easier to implement backups, seems safer (?) and
will make little difference to users with small personal filespaces. 

Multiuser systems could benefit from having system "metadatabases" 
(in /etc and /usr/share ?) which would provide metadata defaults for
shared files. These would be written by packagers (rpm, ...) and would 
provide most users with most of the information they want about most
files. Implementing it all at the user level could be very
inefficient. This need have no effect on the API. User assigned
metadata would simply override the system values. Utilities run as
root would manipulate the systems databases instead of a user database.

I suppose you could use UNIX groups to organise metadata as well but,
like most uses of groups, that would probably be pointless.

One final comment: when archiving files we want to ensure that the 
metadata is stored as well. This really requires metadata aware 
archivers rather than databases spread over user directories. Once
these archives have been produced they may become accessible through
the VFS as tar files. 

I think the right approach in this case is to retrieve the metadata 
from the tar files rather than storing it in the database used for 
non-archived files. This avoids keeping two copies of the data which
could get out of sync at the cost of preventing alterations to the
metadata for archived files (which would be pretty pointless).

Felix 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]