Re: About metadata (long!)



I hope nobody minds me combining these messages ... I wanted (in part) to
isolate this thread from the other thread with the same name.

On 17 Aug 1998, Tom Tromey wrote:

> Anyway, I've written up my thoughts on what the metadata API should
> 	http://www.cygnus.com/~tromey/gnome/metadata.html
> 
> Please send any comments or suggestions back to the list.  I'm
> determined that this thread live forever.

A couple quick comments (I think the shorter I keep my comments the safer
I am...):  Firstly, the line "...changing attributes by hand will be
annoying enough that..." - I think you mean this in reference to extremely
fine grained control over a multitude of files.  Yes, this is unlikely to
happen extensively, but I think that everyone will have "just this one"
file that they want to be different.  Mass customization is unlikely, but
highly specialized files, I think, are.

The second is that "GNOME ... [is a] GUI for Unix".  Perhaps I'm being
naive or *way* out of spec, but I see it as the GNU Network Object Model
Environment.  I would like to use an Object Model on my data and I don't
really care about the GUI.

And finally, on the API: you appear to be missing protection schemes.
I've made reference in the past to an "Author" tag (which people take FAR
too literally: "That if there are ten authors?") - I'll use Daniel's
"Copyright" tag instead.  I realize that this is hardly the most secure
method of protecting data (it can truly only be done if it is in the file
protected by the kernel) BUT each piece of metadata needs some permissions
and levels of scope: Is it inheritable?  Is it visible?  Can it be
changed?

Some examples: An image, with thumbnail metadata - is the thumbnail
inheritable (copy to a new file)?  Sure.  Is it preserved (write to the
file)?  No.  Visible?  Yes.  Malleable?  Probably not. 

A copyright: Inheritable: Forced.  Visible: Forced.  Malleable: No.

A comment: Inheritable: Yes.  Visible: Maybe not (only by the owner).

and on and on.

Otherwise, and excellent summary and proposal.

Oh - one more - the infmaous LD_PRELOAD.  My *nix experience is pretty
limited - Linux/{Intel,Sparc,Alpha}, SunOS, Solaris, and a little DUX.
Never quite made it to the Irix machines, but all these systems support
LD_PRELOAD, so I don't buy your "All the world in not Linux" bit.

HOWEVER, This is my no means a requirement - simply a nicety to help
handle the problem.  Does it solve everything?  No.  Is it required?  No.
Does it help?  I think so.

Again, we'll use your example of 'mv'.  The situation where it does work
is correct.  However, where you state it doesn't work, is also slightly
misleading.  It's true that the metadata would not transfer intact -
however, it needn't be completely lost, either.  If the LD_PRELOAD library
is watching the open() and write() calls, it can see that data going out
_could_have_ been taken in part from a file opened for reading.  If the
file opened for reading has a copyright tag whose attributes include
forced copy, the library can stick the copyright onto the file being
written, and give a reference to how that data got there.  This is hardly
ideal, but I think it is important that we try to preserve this data any
way we can.

But yes, in the end, a GNOME-aware mv is the only real solution.  I've
also made mention of this in my previous posts (/opt/GNOME/usr/bin)

And if I left anything out, that doesn't mean I don't disagree.  =)


On Tue, 18 Aug 1998, Daniel Veillard wrote:

> here, but I also tend to think that they are far too file oriented. My
> reflexion comes from Web metadata (the one I'm most used to), and especially
> the work done at W3C on metadata: http://www.w3.org/Metadata/
> I may have a too "network oriented" vision, but I don't think that a file
> oriented one will really fit the purpose of of a "Network Object Model" :-)
> I beg of seamless access of remote data from my future Gnome desktop,
> metadata handling included !

My entire premise here is simply the storage.  Whether it is
network-enabled or not, this data still has to be physically stored
somewhere, unless you want it to be stored only in memory and mirrored
around the world so that one maching going down does not lose the data.
This is not what you propose, I assume.  I realize you don't want to
modify physical file structures, and I realize that it is not necessary
(as long as you work *only* within GNOME), but the only 'safe' place is
with the (file) data itself.

I've read your metadata stuff at w3.org ... I also know that you wrote
rpm2html - you can as easily write a ea2html that will extract metadata
and put it on a web database.  I realize this is a very simple scenario to
what you're doing, but is a workable solution.

As an aside -- a metadata-aware 'find' could as easily dump this data --
find / -type f -metadata \* -print-ea {} \; | sed 's/$/<br>/' > ea.html

And again, this has nothing to do with *where* the metadata is stored.

> So could people assume in this thread that:
>    - not only files need metadata, but rather the full scope of object
>      manipulated on a desktop (files, URI, persistant objects, programs, ...).

All these things are stored, somehow, as or in files.  Unless you actually
mean to assign metadata to processes that persist only as long as the
session, then there's no need to worry about storage.  Anything that is
persistent must have some sort of record on disk.

> I agree that this makes things far less trivial, I believe that that makes
> them far more useful too. I don't even give the slighliest idea about how
> to actually implement this, but I guess that the debate need first to be
> focused on what do we need, before jumping into implementation tricks.

I honestly don't think you've said anything new - I think each point has
been brought up during the course of the thread.


On Tue, 18 Aug 1998, Kevin Littlejohn wrote:

> (Having now finished this email, I realise most of the rant about preload
> stuff isn't highly relevant - I'll keep it here anyway, to indicate my
> serious dislike for anything that includes preloading libraries :)

(sigh)

> This is a sticking point for me.  Having to preload random libraries into
                                    ^^^^^^^^^^^^^^^^^
Nobody is forcing you to do this now or ever.  LD_PRELOAD is an *OPTION*
that will *HELP* preserve *SOME* of the metadata, under *MANY*
circumstances.  The only problem with using LD_PRELOAD is that it may give
the ignorant user a false sense of security.

It is truly a trivial library that has to be written anyway for GNOME to
handle metadata at all.  GNOME programs will link against it, non GNOME
programs will destroy everything GNOME has done.  LD_PRELOAD will prevent
*SOME* of these non-GNOME programs from orphaning *ALL* of the data *MOST*
of the time.

> the daemons on the boxes around me _is_ _not_ an option.  That's the case

So don't use it, and orphan your data.  It would be orphaned anyway.

> for most non-desktop machines (and probably a few desktop ones).  That
> means there _are_ binaries out there that are shuffling files around
> without the gnome libraries.

There always will be.  I T  I S  A N  O P T I O N.

> > > many files (incoming ftp server, perhaps *shrug*).  I want to be able to
> > > assign a particular icon to new, incoming files off that ftp server - but
> > 
> > The simlest way to do this is have those file have a 'null' (or default)
> > icon, and have processed files have a 'processed' icon.
> 
> This stuff doesn't gain in any way from having preload libraries, that's

You're exactly right.  This problem has its own solution that does not
benefit from LD_PRELOAD.  LD_PRELOAD is not meant to be the end-all,
be-all to the orphaning problem, and if you put it in a situation that
doesn't even deal with orphaning, of course it's not going to help.

> what I'm trying to point out.  In fact, given there's always cases where
> the libraries won't be in the loop, I still contend that preloading libraries
> doesn't _gain_ you anything, and leads you toward making design decisions
> that will bite you later...

No design decisions are going to be based on LD_PRELOAD ability.  No
design decision are going to be based on where the data is stored, UNLESS
you are writing the implementation for storing or retrieving that data.

> But what's the difference?  I _don't_ want to (and in some cases, can't)
> preload libraries into my daemons environments.  And in fact, if I _don't_

Then if they do something that will orphan the data, they will orphan the
data.  There is not a damned thing you can do about it.  LD_PRELOAD *MAY*
help in *SOME* situations - that's it.  That's all.  Nothing more.

> preload it into root, under your scheme, I'm going to be orphaning metadata
> every time I make changes to my filesystem as root...

You're going to orphan it anyway as long as you use non-GNOME apps.  This
is really very simple.  There are two types of apps - GNOME, and
non-GNOME.  GNOME apps work fine.  non-GNOME apps trash GNOME.  However,
if we use LD_PRELOAD in front of the non-GNOME apps, some of them will
preserve enough data to not trash all the GNOME data.  They will not be
perfect.  They will not always work.  They will not solve world hunger. 
They will help to preserve database integrity when using non-GNOME tools
under some (many) instances. 

> (Note, I understand that using extended attributes dodges that...)

Only to an extent.  EAs (which I presume we mean to be data stored as a
branch in the inode) will still get lost when copying across a filesystem.
LD_PRELOAD would still be needed for (in the example above) force-on-copy
attributes/metadata.

> > It is a lot of tinkering.  I think it's worthwhile though...  About
> > stuffing up non-Gnome aware programs, I just don't know what you mean.
> > Putting the metadata with the raw data won't affect those programs at all.
> 
> Correction - just to keep things clear - putting a link to the metadata in
> the inode of an ext2 system won't affect programs.  Again, this only works
> for ext2, unless you care to take the time to make your desktop manager's
> libraries aware of all the different filesystems out there - assuming they all
> have space for added attributes.  

Okay.  We have a few different issues here.  Firstly, it has only been
*explored* for ext2.  There is spare data space in the FAT filesystem as
well, though that is hardly robust (it can be fudged, but running CHKDSK
would result in a lot of "lost clusters" - but again, they could be
recovered if the EA Data itself is done properly, just like they could
from /lost+found below).  ext2 is the only "safe" filesystem that can be
used to embed data because we control the source, though we could probably
apply it to ext as well if that has space.

Next, you're talking about a desktop manager, which I don't understand.
GNOME will have a library for handling EA data.  This is what would become
the LD_PRELOAD library.

As to making it filesystem aware, yes, for each filesystem that we want to
tinker with directly it will need to know how to tinker.  Not all
filesystems will be supported for direct embedding.  Very few would.

> And I know you're argueing to only do this with ext2, but what does it gain
> you?  I still believe the non-filesystem specific ways of doing this can be
> made as robust as the filesystem-specific ways - or if not, then damned
> close.

As long as you're in GNOME and GNOME only, they are identical.  Putting it
in the filesystem simply helps to preserve the database integrity when
manipulated by non-GNOME apps.  LD_PRELOAD is the next step.

Why put it in the inode?  There *has* to be a central database for
immutable data associated with a file.  One example could be thumbnail
data.  Another could be copyright data.  I like the thumbnail because that
is a lot of data.

How do you store this thumbnail?  You can store it as a BLOB in a large
centralized database.  Or you can store it like 'xv' does.  Or, you can
store it with the file itself.

Now, if you store it as a BLOB, all protection schmea has to be controlled
by the GNOME libs (who owns the file, what permissinos are on the file,
etc, and it has to know when these permissions change and change the data
accordingly).  The same is true of you store it like xv does.  Running
'chmod' from the commandline immediately corrupts your database.  This is
where LD_PRELOAD will save you.  However, if you can store it *in* the
file, all the protection is handled by the kernel, and you don't even need
to worry about LD_PRELOAD.

Now all this is going to have to be written anyway, yes, absolutely.  BUT
if you _can_ put it in the inode, running chmod is no longer a disasterous
action.

> > I suspect this is already being used (especially by the hurd) but there's
> > an OS dependent 2 structure in there as well.  Now, imagine that
> > indode.osd1.linux1 contins a pointer to another block, which is in reality
> > another inode just like the real data inode, except it contains only
> > metadata information, exactly as it would appear if it were an entry in a
> > non-integrated metadata database.  Standard read()/write() calls will only
> > see the real data - the GNOME libs will have to look at this structure
> > specifically, and address the metadata inode directly (or convince the
> > ext2fs driver to do it).  I'm not familiar with that low-level
> > programming.  It *is* very low level, but should not require rewriting the
> > driver (because, after all, we have the source to ext2 and can do anything
> > it can do).
> 
> 'doing anything with the source to ext2' _is_ rewriting it, sorry - and

Wherever did I say I was going to change the source to ext2 ?!?!?!?

What I said is that we have the source to ext2.  That means, in a
worst-case scenario, we can copy the copy into the GNOME libraries and
they can manipulate the filesystem directly.  NOTHING in ext2 changes.
GNOME simply uses some currently unused structures that ALREADY EXIST.

> you'll have to make sure that nothing else _is_ using that information,

Absolutely.

> and that tools like ext2ed and e2fsck understand this new metadata, and

Not really.  It would be good if e2fsck understood it, but if it didn't,
the worst-case scenario is that it strips out all the metadata as lost
chains into /lost+found.  Each file will be a complete chain of full
metdata information.  Record in the metadata to what it was referred, and
GNOME can reattach this data.

> that gnome people have root access to the box they're installing on and can
> recompile the kernel for their desktop - Personally, I think this is going

Absolutely incorrect.  No root is needed except to reattach lost data, and
that can be done from rc.local

> too far for a desktop manager.  As part of a project to extend the

I've never been talking about a desktop.

> capabilities of linux or ext2, yes, that's a brilliant thing to be doing
> - but once again, GNOME is _NOT_ linux specific, or ext2 specific -
> you're about to embark on a lot of work for something integral to the

It  I S  N O T  integral to anything.  It is  A N O T H E R  O P T I O N.

> > Each user will have to have their own 'preferences' database, even if my
> > integrated filesystem approach is used.  So it's clear (I hope), my
> > suggestion to integrate the data into the fs does not alleviate any of the
> > other strains GNOME metadata storage people will face - it is simply an
> > alternative to a global, instance-specific database.  We will still need a
> > global mime.types for classes of information.  We will still need a
> > userlevel preference database.  What we won't need is a global "registry" 
> > like Windows/OS2.  What we will gain is the ability for the owner of a
> > file to embed any data (s)he wishes without having to worry about other
> > people mucking it up.  Example: An "Author" tag.  Sure, you can figure out
[...]
>
> Sorry, your linux kernel will handle it.  On a non-linux system, we can't
> use any of that, because in most cases we can't recompile our kernel.

Any kernel that userstands UIDs and ownership will handle it.

> <snipe>
> Have you started talking to the BSD guys about extending their filesystem
> yet?  'cause I bet they'd have comments on it as well - it's something you'll
> have a _real_ hard time selling...
> </snipe>

I don't even know what fs they use.  I would be willing to though.

> Ok, so Author, we presume we can only have one of.  What happens when a
> system of 200 people, all running the desktop off one server (or off one
> fileserver, maybe), want to place the same attribute with a different
> value on a particular file?

Why would 200 people all claim that they authored the same file?  I think
I already responded to this above.

> You're not talking about something 'new' for the kernel to handle if you have
> a personal registry - all you're talking about is the ability for each
> person to keep their own list of meta-attributes for any given files.  It's
> something that's _completely_ userlevel - which is what it should be.

Yup.

> > But as I said, yes, if you do embed the data and then lose it, it will
> > show up as a "lost chain" in the filesystem.  This will be bad because it
> > wastes disk space.  However, if this does happen and you fsck the drive,
> > these chains will reappear in /lost+found no?  Then, imagine this: when
> > GNOME boots, give it a "-recover-directory:" parameter where it will scan
> > these lost chains and reattach them to their parent objects.  This can be
[...]
> No, no no :(  Anything that's orphaned in that way on an ext2 filesystem
> is prone to be written over.  What you're suggesting is 1) not reliable,

How do you figure that?  The blocks are still marked as being in use.
They will not have a dtime set.  They are lost chains, and will be
recovered.  Besides, ext2 seems to overwrite the least-recently deleted
blocks anyway so you have plenty of time even if this were true.

> is prone to be written over.  What you're suggesting is 1) not reliable,
> 2) completely linux/ext2 specific - if you really want to program OS/fs
> combination awareness into GNOME for the number of fs'es that are out there,
> then you're biting off _far_ too much work.

I'm not.  I'm suggesting it for ext2 and any OS that uses ext2 and that
does not use the inode struct I want to use.

> > Sure - the two things have nothing to do with one another, except that a
> > preload will help to ensure the consitency of the database from non-GNOME
> > aware apps.
> 
> _no_, it will not.  preload is _not_ reliable.

It will HELP.  I never said it was reliable.  It's more reliable than
nothing though, becuase nothing is guaranteed always not reliable.

> > This should be short as long as you don't balk at me wanting to use spare
> > data structs inside ext2.  :)
> 
> I _don't_ use ext2 in _most_ of my work.  Period.  The same applies for

Then you don't get the fancy benefits.

> You're suggesting a whole swathe of work that not only isn't going to
> benefit a lot of people, but is going to duplicate the functionality of
> your 'fall-back' system, and is going to require a _much_ higher ability

There is no fall-back system.  GNOME needs a MAIN system that is
completely database driven.  This is the DEFAULT or PRIMARY system.
People using Linux/ext2 get the CHOICE of using the SUPER-SPIFFY system
that's much more reliable with non-GNOME applications, but only works on
Linux/ext2.

> to tinker with the system/knowledge of the system from the end user (ie.
> now we have to recompile our kernel to run gnome).

Nobody has to recompile anything.  The space is there.  It is unused.  It
is waiting for someone to derive its purpose.

> > Getting the data to other system _is_ a problem.  My initial solution was
> > to modify NFS to send the metadata stream to a client that requests it but
> > nobody seemed to like that.  The only other solution that I see offhand is
> > a GNOME attribute daemon to run alongside the NFS daemon (ala xfs).  (This
> > daemon could be anything - from a custom app to a SQL server).  That's
> > what I talked about when I spoke of a NFS server with hidden data.
> 
> Neither of these are nice.  Also, how do you handle moving files from
> one filesystem to another - like dumping files to a MS-DOS disk?  The
> transferral of stuff from person to person is going to be another sticking
> point, and it's not something I want to think about right now :).

I've already discussed this -- OS/2 embeds EAs in HPFS.  When you switch
to FAT, those EAs get dumped to a secondary file named "EA_DATA. SF" or
something.  When you copy off the floppy back to HPFS, those EAs get read
out of that file and reinserted back into the filesystem, WHERE THEY
BELONG, DAMMIT!  =) 

Now, as far as getting from one system to another ... how do files do
that?  There are a few primary ways:

rcp:  Non-GNOME.  Data will be orphaned.  Is there a solution?  Some data
may be preserved using LD_PRELOAD, but the remote service will have to
become GNOME-aware.

ftp:  Non-GNOME.  Same as 'rcp'.

nfs:  Non-GNOME.  Can be worked about by modifying the NFS clients and
servers to send and request 'metadata streams'.  LD_PRELOAD will not help.

http: Non-GNOME.  May be worked around using LD_PRELOAD, but the web
server will have to send metadata as X-*** headers, just like rcp.

Pretty dismal, no?  The only solution that I see is using again the same
code that would become LD_PRELOAD, and make it the GNOME-attribute daemon. 
This daemon would have to run on the rcp host, the ftp host, the nfs host,
and the http host, and would be queried independently the other service,
via a LD_PRELOAD. 

See how that would work?  All the attribute functions are compiled into a
daemon that can be run standalone or in front of fopen() &c, and has the
ability to query itself for local or remote attribute access.  OF COURSE,
all the GNOME apps would already use these BECAUSE the are GNOME apps, NOT
because they are LD_PRELOADED.  LD_PRELOAD apps will not be able to handle
metadata, they will simply have a wrapper around them that tries its best
to make sure that they don't accidentally remove it.

> I have two questions I'd like answered re: extended attributes:
> 
> 1) How do you deal with many people wanting different values for the same
>    attribute on the same file?  (eg. different viewers for the same particular
>    pictures).

This is all stored in the users' preferences database.

> 2) How do you deal with trying to assign attributes to files you only have
>    read access over?

It depends on the attribute.  If you have the ability to override the
attribute, then it is stored in your local database.  If not, then you
can't change it, just like you can't change the text in the file.
Example: if the file has a copyright attribute, you can't change it.

> My proposal is still (for reference :) :
> 
> A system-wide, and a more specific person-wide database (both of the same

person-wide?

We need:  A system-wide "hints" or "class" database.  mime.types, eg.
          A system-wide "attributes" database (or store it in the fs)
          A user-level "attributes" database (to override defaults)

#2 is the hard one, because you need to essentially duplicate the
functionality of the kernel, plus attribute attributes.  It will also have
to be SUID/SGID so that any user can modify it for their own files.  A
better solution may be the 'xv' style database so that users can own their
own attribute db, but that doesn't work when you have mixed UIDs in a
single directory, so can't really work.

> format), that can return methods for queries on files.  The database itself
> should be, probably, a .db file, or integrated into Corba, or something

Hmm.  If it is a standalone file, then .db is probably okay, but for
integration directly into the fs, it should be text.  Something like:

Signature: GNOME-Metadata
Reference.file: /usr/local/share/example.file
Reference.size: 43218 bytes
Reference.time: (utime), (mtime), (ctime), &c
Reference.owner: (uid), (gid), (perms)
Comment.1: This an a comment that must follow this file, it cannot change
Comment.1: Additional comments may not be added.
Comment.1.perm: FmL  {Force-on-copy, !Malleable, Locked}
Method.1: View
View.1: xv
View.2: display
Method.2: Edit
Edit.1: gimp


I'm sure this is hardly ideal, but it should be a fair starting point.
Keeping the backreference at the beginning of the file should help in
reassociating the data if it were ever lost.  Hostname, IP, ethernet
address, something to indicate the machine it originated from would be
helpful as well...

There's a lot of stuff this doesn't deal with (regexp based defaults -
mime.types) or probably a myriad of other things, but I did it quickly and
with a pounding headache now, thanks alot guys.  :)

> any 'open' request for any '*.ini' file, use this SQL database.
>    (We are now able to store all our config files in a database, accessable
>     from anywhere that has gnome and our private database :)
> 
> any 'execute' request for any 'GNOME/BIN/*', execute from '/opt/gnome/bin/*'
>    (Sysadmin, or personal user, can now install new versions of whatever,
>     wherever, trial them and switch back easily, etc. etc.)
> 
> any 'view' request for any '/opt/web/http:*' file, use this browser
>    (We can now create 'virtual' portions of our filesystem - I can
>     browse through /opt/web/http:www.gnome.org with any gnome prog)
> 
> Imagine 'My Computer' done through this - no longer do we need any knowledge
>    of /proc as a user, we can select 'my computer', select 'cpu', and presto,
>    there's the info you're after.  This is doable with some simple remaps
>    for view requests on '/proc/*'
> 
> I'm sure there's other nifty things we could do with this - I'd rather not
> have to tie it down to having explicit files on the hard drive to make
> this stuff happen.  I reckon we could create an entire object-oriented
> view of the system, not necessarily tightly restricted to the filesystem,
> but all interfaced in the same manner - consistency...

Good ideas, but I don't want to think about the right now.

--
Christopher Curtis               - http://www.ee.fit.edu/users/ccurtis
                                 - System Administrator, Programmer
Melbourne, Florida  USA          - http://www.lp.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]