Re: A Question about Metadata storage



The only thing we absolutely HAVE to do to implement meta data is obtain
a reference for the file (whatever) that can be used to pick up the meta
data somewhere else (in a database/directory tree, whatever).

There is a case for using a URL as the reference since we might want
to attach meta data to objects located elsewhere on the net. URLs also
have the merit that they can refer to abstract spaces like the objects
obtained from a server (httpd:, etc).

However, URLs are long strings that can only be simplified to a limited
extent (because you do not know how a server will interpret the request:
cgi?). I think we had better settle for implementing meta data on file
systems for the moment.

Using the filename has the problem that mv can break the link. One
alternative is to use the inode number of the file and hold separate
meta data databases for each file system.

Hard links create two inodes, so you get two different sets of meta
data. Symbolic links could be implemented to have different meta data
from the file they refer to, or, you could follow the link through and
do the final look up based on the inode of the "real" file. I suspect
that following the link would be more useful but this might not always
be the case.

Since inode numbers are integers a very efficient hash lookup could be
implemented using this system. This approach should be portable across
all *NIX file systems. It also works on DOS file systems mounted under
Linux. It would break on an port of GNOME to DOS (ha ha ha).

Networked file systems may prove a different matter. My experience here
is limited to NFS.

The inode numbers for an NFS mounted file system are different from the
inode numbers in that file system when mounted locally. However, the two
sets of numbers are closely related and you can easily translate between
them for the vanilla Linux NFS server daemon. I do not know whether the
translation process is different for different NFS servers. However, in
the worst case you could just hold two reference numbers for each piece
of meta data (one when NFS mounted and one when locally mounted).

However, the real problem is that a (GNU) mv across file systems does
not preserve the inode number. Obviously, cp creates a file with a new
inode number. (Coincidentally, GNU mv seems to be unable to move files
between file systems which have been NFS mounted together).

These cases are genuinely hard to solve. Several solutions have been
suggested: 

1) incorporate the reference into the file - this is nasty, programs
   that interpret the files will probably break :(

2) incorporate the reference into a spare field in the inode - this is
   non portable to DOS file systems and, I suspect, networked file
   systems.

3) LD_PRELOAD a library that updates the references whenever a call to
   the routines that change the file system are invoked - this would not
   work for set{u,g}id processes and could prove very tedious to port.

4) Only support reference updating from within GNOME based file move
   and copy utilities - this restricts use of non-gnome utilities.

Options 3) or 4) could work with using inode numbers as references. I
incline towards 4. We could provide gnome-mv and gnome-cp and let users
alias mv and cp to them in their login scripts. With the inode
referencing system the extra code would only be called into action when
moves were made across file systems.

I do not like using LD_PRELOAD since the executables that you end up
altering no longer do what they were intended to do. This sounds like a
recipe for bugs and bugs in the file system are not nice.

---

Of course this only works for file specific information. You need a
separate mechanism to cope with file types (mime, whatever). Trying to
do both these things together might be a mistake. Perferences for file
types (what kind of editor do I use for images, etc.) tend to be
personal while meta data (what is this file?) is sharable.

Using inode references would make it impossible to use wildcards in your
meta data database. However, you probably only want to do that in order
to implement a file type mechanism in a kludgy way. It would also
prevent you using a general text editor to update your database but in
the interests of efficiency that may be worth sacrificing.

Felix



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]