Re: About metadata (long!)



On Wed, 19 Aug 1998, Kevin Littlejohn wrote:

> (I'm snipping a whole heap of stuff because this is getting too long.
> I'm trying to restrict down to these points only at this stage...)
> 
> > 
> > > I have two questions I'd like answered re: extended attributes:
> > > 
> > > 1) How do you deal with many people wanting different values for the same
> > >    attribute on the same file?  (eg. different viewers for the same particular
> > >    pictures).
> > 
> > This is all stored in the users' preferences database.
> 
> So if a file has an attribute 'viewer', and every user on my system wants
> to use a different 'viewer' for a particular file, we _won't_ store the
> 'viewer' metadata in the extended attribute?

Not really true.  We will store a default (set of) viewer(s) in the file.

> What _are_ we storing in the extended attributes?

Primarily non-mutable data.  This is stuff such as Copyright, which the
owner wants to assign to a file, and give a "children of this file must
inherit" flag (think like storing the GPL here).  Also, Author, presuming
a file has only one Author - there would be a flag perhaps to reference
this attribute in all children (file copies).  And thumbnails would be
stored here, because this is specific to this data and we don't want each
user to have to create their own thumbnail in thier personal library. 
Basically, anything that is not user preference (that is system default) 
would be stored here, plus "metadata attributes", for what attributes can
be assigned, inherited, (not)created, overriden, forced-on-copy (GPL),
lost-on-edit (thumbnail), etc. 

> > > 2) How do you deal with trying to assign attributes to files you only have
> > >    read access over?
> > 
> > It depends on the attribute.  If you have the ability to override the
> > attribute, then it is stored in your local database.  If not, then you
> > can't change it, just like you can't change the text in the file.
> > Example: if the file has a copyright attribute, you can't change it.
> 
> So, what I'm trying to figure out is, exactly what are you supposed to be
> gaining from changing the data on disk?  It seems like this has narrowed

The benefit here is ownership.  Imagine we have this huge global database
for storing thumbnails, copyright info, etc, etc.  This will exist for
every file on the system, which means that it either needs to be world
writable (so people who own files can change their metadata), or every
GNOME program has to be SUID/SGID so it can modify this database. 
Alternately, we would need a GNOME-metadata-daemon who is SUID/SGID for
updating this database.  If we can put this data into the file itself, the
kernel will handle all that protection and we don't have to worry about
yet more SUID/SGID binaries on the system.  This daemon would still have
to be written for both non-supported fs's and remote fs's (unless we can
hack NFS, in which case remote supported fs's won't need the daemon).  If
we _can't_ use the filesystem, we literally have to duplicate all the
kernel protection schemes in the daemon.  Additionally, look at my 'chmod'
example: if the metadata is in the fs, non-GNOME chmod works fine, even
without LD_PRELOAD.  If it's not in the fs, it is immediate database
corruption because the database has to reflect the file permissions on
every file, or, at a minimum, it would have to have a reference to the
file (ala symlink), that it would then have to stat() to find the
permissions.  (That is, instead of just doing a db lookup and reading the
perms, it has to do a db lookup, a field lookup, and a stat() to read
them.)  Why?  Because, imagine again, that I want to store a thumbnail
against an image that is read-only by me.  I should not be able to do that
an a system-wide basis, and GNOME has to know that I do not have
permission to do so.  And if you don't buy that, then think again about
Copyright.  Can I assign Copyright to a file I don't own?

> down to 'if we can tinker with this structure on disk, then we will, and
> we'll store something in it - but not if someone else has already stored
> something in there, or if we don't have write access to the file, in those
> cases we'll fall-back to the main system.

Only user data (eg, preferences) that do not apply to the file as a whole
will be stored in the user database.  Think of these as "owner" databases,
and "other" databases.  Owners can do certain things with files (and their
attributes) that others can not and should not be able to do.  Can the
owner of a file restrict the viewers? (yes)  Can someone else? (no).  Look
again at my sample db entry.  Setting the "locked" flag as the last entry
of a particular method would prevent others from add new entries to this
method.  However, the could copy the file and if that attribute (lock) was
not inheritable, they could then assign a new viewer to the copy.  They
could do this becase they would own the copy and the owner of the original
did not specify the lock to be forced-inherited (like perhaps the original
author may have force-inherited a copyright notice).  The next problem is
dealing with data that the owner cannot change (eg, the force-inherit on
copy flag should not be able to be removed from copies even by the owner
of the copy). 

> If I'm getting this right, you're suggesting extensions to the linux
> ext2 filesystem, which I respectfully suggest is _not_ a GNOME project,
> but a linux/ext2 project.

Well, this is a matter of symantics.  Changing the filesystem (even though
there is no change - we're just using structures not currently used) is,
at its lowest level, and ext2 project (has nothing to do with Linux). 
However, it is a change in ext2 for the "better" operation of GNOME.  That
is, it is a modification to how ext2 is used (not ext2 itself)
specifically for object data as a direct result of the GNOME project. 

> > We need:  A system-wide "hints" or "class" database.  mime.types, eg.
> >           A system-wide "attributes" database (or store it in the fs)
> 
> This is what annoys me - "_or_ store it in the fs".  That's plain _wrong_ -
> if you wanna store stuff in the fs, be my guest, but _don't_ suggest that
> storing stuff in the fs should in any way, shape, or form enter into the
> design of the database.

A filesystem _is_ a database.  Storing it in the fs is simply using the
database that already exists, instead of creating yet another layer of
database on top of it.  My point is simply that *if* you use the
filesystem, you don't have to recreate this wheel.

>                          Storing stuff in the fs is a _complete_ extra,

Actually, storing the stuff in a "database" is a hack.

> something that can be considered way down the track, probably as a standard
> method for files that match certain prereqs - if file->attr has 'extended',
> and query is for 'view', use 'file->extattr->view'.  That sort of thing
> should be supportable by a solid, _generic_ database - without putting
> "or's" into the design suggestions.

Uhm, no.  There is no difference in *data* between what is stored in the
fs and what is stored in the database.  It's just a matter of where this
data is being stored.  Okay: The GNOME attribute daemon will start up.  It
will read its own inode.  If it has EA data in the fs, it realizes that
the user does not mind / has the capability to store data IN the fs.  This
becomes the default for all local files on that filesystem (the daemon
will only work for local files - remote files will be accessed through the
a similar daemon that has its own same startup procedure).  If it cannot
store IN the fs, it will use the system-wide, non-fs, database.  This can
be a runtime option settable from a config file, or a compile time option. 
If it's a compile time option, the daemon will not need to load UID
protection schema for in-the-fs attributes, thus conserving memory, etc,
etc.  (Actually I'm not too sure about this because we still have to deal
with remote authentication, but this is "in principal" stuff - this
specific daemon may only work without remote collaboration support (which
I honestly think will be the primary role) or hacked-NFS only environments
(in which case the NFS daemon would handle EAs on the users' behalf)).

Actually, I've been thinking about this *a little* and what about making
this GNOME-attribute daemon THE "GNOME process"?  To be honest I'm quite
happy with (dare I say it?) KDE and/or WindowMaker.  But I would like a
robust object model.  This, for me, has absolutely nothing whatsoever to
do with a GUI.  Anyway, I was thinking to combine the GNOME attribute
daemon and init (so that it functions like init for GNOME-able processes,
not to actaully combine the two).  Give it GNOME-levels, where GNOME-level
0 is to shutdown, GNOME-level 5 (for instance) is X, etc.  Then in your
.xinitrc file, do a telgnome 5 and it will start all the needed GNOME
processes from ~/.gad or ~/.gamd or /etc/gamd or whatever.  If any GNOME
program dies it can respawn, and gamd (gad) itself would go into the
inttab.  Not too sure how devastating it would be if gad (gamd) died, as I
said, I haven't thought this out too far.

[...]
> That stuff is pretty much what we should be looking for, to my mind - I
> think the PRELOAD stuff and the ext2 stuff is smokescreen over the actual

It's not really a smokescreen or anything else.  I just offered it as an
option and then this whole thing just blew up.  I think it's good though
because we've hit a lot of issues I don't think anybody had really thought
about.  *Where* the data is stored is really a non-issue in my mind, which
is why I guess I just don't understand everyone's reaction.  I say store
it in the fs if it's handy because it is easier, and safer.  If not, well,
you gotta do what you gotta do.  They will have absolutely identical
results as used from within GNOME (though I would think that the fs method
would be faster).  As used outside of GNOME things change, is this is
where using the fs directly and where LD_PRELOAD become really useful. 

> design the database carefully, these things should be easy adds later on...

I've said this as well.

> gfdb_query(filename,'view') checks through a db file for anything that
> matches filename (either by name/regexp or by hash, or whatever), and

I don't think that really works well.  (Tromblin?)'s site at Cygnus has a
pretty good API layout.  This is far too limited.  The database libs would
use just the filename alone to associate class operations.  It would do
this by looking at a set of mime.types and regexps and assigning their
values.  It would then look up that file specifically in the database (or
in the fs...) for additional attributes assigned by the owner of the file. 
Then, it would look up that file in the user database for attributes
assigned by the current user.  These would be collated, categorized, and
excluded, and the set of methods would be returned.  I suppose you could
limit it to a single method ('view') and only get a list of possible
'view' values, but this is a nit.

> returns the method 'view' associated.  I think if you can manage to
> create some sort of session management, you can add cacheing of the responses,

You cannot cache anything as long as non-GNOME apps exist, to be proper.

> I'm wondering if maybe we're aiming toward subtly different things - from what
> I'm understanding, you're aiming toward the files being objects, keeping
> as much information about themselves as possible - so if you move a file
> from your system to mine, it'll keep it's info.  This seems really good
> for things like author, copyright, creation time, version, source, etc.,
> but really bad for things like editor, viewer, copy_program, etc.

None of that information gets stored with the file (in general.  It
*could* be done, but isn't really worthwhile).  The default actions are in
the mime.types, and the user preferences are in the users' database.  The
owner of the file *could* put this data there (and they probably will) but
even moving this to your system, your system defaults will still be
available (again, in general, if the owner of the file doesn't turn them
off (if they can)).

> may be larger than I'm aware of...  I _don't_ like your idea for the
> second, as I'm suggesting a way of me personalising my desktop in a way
> that is consistant, and portable for me - so I can determine that I
> use this viewer for that type of file, and modify the behaviour of
> my interface to the computer as I see fit - and then carry that environment
> either via networked database engines (ala ACAP, as pointed out by someone
> else), or via floppy-copy of the relevant .db files, or whatever.  I don't
> want other people's preferences re: viewers and editors and so forth to
> extend to my desktop, thanks :)

I think you're wrong.  You don't think that.  =)

Okay, the user will also need two databases - mime.types for their own
classes, and a seperate one for overrides.  The system defaults and file
defaults will always be available (I think they should) though there may
be an option (attribute attribute) such as "ignore defaults" for, for
example, "view", that you could set in your personal database and which
would turn these off.  However, if you bring your attributes with you and
the system doesn't have your preference and you have ignore system
defaults turned on you won't have any options.  I see little reason to
ignore the system defaults, but it could easily be added as an attribute
attribute.  Then, to bring your desktop with you, you would just bring the
mime.types (regex database) plus your personal references database.

I know nothing about ACAP and haven't looked at it, but I am going out on
a limb here and going to say that it's either insufficent or too bloated.

This is _truly_ stuff that belongs in the filesystem with perms, etc, and
I am wary of using something that wasn't designed specifically for this.

Could be wrong, maybe am, just sayin'...

> I'm hoping that the suggestions I've made can be carried out to a system
> that will cover your requirements too - gnome-copy programs that export
> the relevant meta-data to whatever media you're pushing the file to, so
> you _can_ shuffle things around _with metadata intact_, so long as you
> use the gnome-tools.  I think that's reasonable - I hope so, anyway :)

Anything that is GNOMEd should work fine.  The biggest issue, especially
in the early days, is the non-GNOME system, and more precariously, the
semi-GNOME system.  Migrating to an Object system will make the a.out to
ELF migration seem trivial in comparison. 

--
Christopher Curtis               - http://www.ee.fit.edu/users/ccurtis
                                 - System Administrator, Programmer
Melbourne, Florida  USA          - http://www.lp.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]