GNOME Metadata [was Suggestion for file type detection approach]



This thread has stepped beyond file-type detection.  This discussion has
been focused on how to get the metadata for a file faster, but there is
no issue with getting the metadata for a few dozen files. The problems
are:

1. The directory is an inefficient mechanism for organizing a large
number of files.

2. We cannot query a simple file system to return a set.

I'm all for using EAs when they are available, but they are not
available on all file systems that GNOME may run, hindering them as a
solution.  Moreover, there is a hidden cost for EAs.  They take up more
disk space (1-5% disk loss), apps must know to use them, and since most
do not, the metadata is also stored in the file header or footer.  EAs
can only be accessed in the context of a single file, since the data is
not organized in to anything like a database designed in the past 30
years, we cannot query them.  Nor is the file system normalized to
prevent duplication or orphaning of thumbnails and mime-types.

As a point of fact, Medusa does store a table of file metadata, and I
can query my file system to get a set of data like file-name,
mime-type.  I can query by directory, or mime-type, or keyword
(emblem/topic/category), and more.  This mechanism returns several
thousand file matches in less than a second.  Some of this is outlined
at http://members.cox.net/sinzui/medusa/index.html

The Storage project, by it's nature, addresses the metadata problem. 
Though it focuses on being a smart file/data system, it can be used to
manage the metadata of files outside of it.  Because it is a portable
file system, there are no EA issues with the OS's file system  It is
designed to return an arbitrary set of data matching a query like a
directory, or category.  Some details about Storage can be found at
http://www.gnome.org/~seth/storage/
and a proposal for handling metadata in it is located at
http://members.cox.net/sinzui/blog/sutra.html

This said, I don't think Medusa or Storage is appropriate.  Medusa's
focus is searching, and it's underly code isn't suited to managing
metadata well.  Storage is as it same suggests, and it doesn't help
users or applications that must use the native file system.  

We need, as has been proposed, a metadata system to manage the data. 
Both Medusa and Storage provide VFS access, but the real need is to
co-opt the existing VFS methods to read and write to the metadata system
when doing IO to the underlying file system.  An incremental indexer is
needed to collect the metadata for files not written through VFS.  FAM
could be used, but it will not scale; a smart indexer is need that can
watch the locations that will change most.  Many applications, like Web
browsers, music managers, and file managers need direct access to read
and write metadata without writing to the file system.

One final thought.  Metadata isn't a GNOME issue, all desktops have the
same issues.  Freedesktop might be to right place for it.  If other
desktop apps like KDE were writing to the metadata DB, there would be
less need for indexers.

On Sun, 2004-01-04 at 08:13, Adam Williams wrote:
> > When trying to find the mime type for the file, gnome-vfs will first
> > check if this EA exists. If so, use that for the mime type. If not,
> > determine the mime type as usual and save it.
> 
> Yes, please!  I mentioned EA awhile ago to no response; sniff, look at
> extensions, whatever - and keep your results.  Even better if
> applications that created files put the EA tags in themselves.  This is
> so obviously the right solution - Mac OS has been doing this forever,
> and Winblows is starting to make real use of EA as well.  EA follow the
> file when copied, moved, etc... You could even store a thumbnail in EA
> and not generate one every time or cache it somewhere annoying like
> ".nautilus".
> 
> > It works only for local files. 
> 
> Not sure this is true.  The EA on NTFS can be accessed by CIFS calls,
> and NFSv4 at least supports extended attributes.  But maybe niether are
> ture on Linux at this point.
> 
> > If you don't have permission to change
> > the file, the EA will not be saved. And you need a file system which
> > support EA, of course. I tested it with ext3. Might work with others
> > too.
> 
> Works with ext3, XFS, and JFS (at least).

-- 
__C U R T I S  C.  H O V E Y____________________
sinzui cox net
Guilty of stealing everything I am.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]