Re: [Summary] Meta-data/filesystem-encapsulation

From: Kevin Littlejohn <darius connect com au>
To: gnome-list gnome org
Subject: Re: [Summary] Meta-data/filesystem-encapsulation
Date: Tue, 18 Aug 1998 17:38:49 +1000

(Having now finished this email, I realise most of the rant about preload
stuff isn't highly relevant - I'll keep it here anyway, to indicate my
serious dislike for anything that includes preloading libraries :)

> 
> On Tue, 18 Aug 1998, Kevin Littlejohn wrote:
> 
> > What about the case of daemons rotating logfiles?  Or daemons that create
> 
> Well, *if* the data were with the file (in its inode) a 'mv' wouldn't harm
> that data.  In most situations though, yes, this daemon would need the
> library preloaded.  The only other option is to have the data orphaned.

This is a sticking point for me.  Having to preload random libraries into
the daemons on the boxes around me _is_ _not_ an option.  That's the case
for most non-desktop machines (and probably a few desktop ones).  That
means there _are_ binaries out there that are shuffling files around
without the gnome libraries.

> 
> If you simply want the newest file to have a particular association and
> the oldes files to retain similar (to each other) but different (from the
> newest) settings, this is best done with a regexp type database.  More on
> this below...
> 
> > many files (incoming ftp server, perhaps *shrug*).  I want to be able to
> > assign a particular icon to new, incoming files off that ftp server - but
> 
> The simlest way to do this is have those file have a 'null' (or default)
> icon, and have processed files have a 'processed' icon.

This stuff doesn't gain in any way from having preload libraries, that's
what I'm trying to point out.  In fact, given there's always cases where
the libraries won't be in the loop, I still contend that preloading libraries
doesn't _gain_ you anything, and leads you toward making design decisions
that will bite you later...

> 
> > from.  I also don't fancy, even on the boxes I _do_ have root on, having
> > to preload this library all the way through - if my desktop database breaks,
> > my OS might not boot up?  That doesn't seem too logical to me...
> 
> Well ... LD_PRELOAD does not work for SUID/SGID scripts - it may or may
> not work for the root user - I'm not too sure.  However, if you want a
> daemon to use the preload, set the LD_PRELOAD in that daemon's
> environment.  I would never suggest setting it in the root environ.

But what's the difference?  I _don't_ want to (and in some cases, can't)
preload libraries into my daemons environments.  And in fact, if I _don't_
preload it into root, under your scheme, I'm going to be orphaning metadata
every time I make changes to my filesystem as root...
(Note, I understand that using extended attributes dodges that...)

> 
> > I'm also opposed to embedding information into the files, for a number of
> > reasons - it's difficult to clean out if I decide I don't like gnome
> 
> That's not true at all -- I sure like this find command:
> 
> find / -metadata \* -exec ResetGnomeData {} \;
> 
> > anymore, it runs the risk of stuffing other non-gnome-aware programs up
> > if gnome guesses the filetype wrongly, and it's just plain too much tinkering.
> 
> It is a lot of tinkering.  I think it's worthwhile though...  About
> stuffing up non-Gnome aware programs, I just don't know what you mean.
> Putting the metadata with the raw data won't affect those programs at all.

Correction - just to keep things clear - putting a link to the metadata in
the inode of an ext2 system won't affect programs.  Again, this only works
for ext2, unless you care to take the time to make your desktop manager's
libraries aware of all the different filesystems out there - assuming they all
have space for added attributes.  

And I know you're argueing to only do this with ext2, but what does it gain
you?  I still believe the non-filesystem specific ways of doing this can be
made as robust as the filesystem-specific ways - or if not, then damned
close.
 
> I suspect this is already being used (especially by the hurd) but there's
> an OS dependent 2 structure in there as well.  Now, imagine that
> indode.osd1.linux1 contins a pointer to another block, which is in reality
> another inode just like the real data inode, except it contains only
> metadata information, exactly as it would appear if it were an entry in a
> non-integrated metadata database.  Standard read()/write() calls will only
> see the real data - the GNOME libs will have to look at this structure
> specifically, and address the metadata inode directly (or convince the
> ext2fs driver to do it).  I'm not familiar with that low-level
> programming.  It *is* very low level, but should not require rewriting the
> driver (because, after all, we have the source to ext2 and can do anything
> it can do).

'doing anything with the source to ext2' _is_ rewriting it, sorry - and
you'll have to make sure that nothing else _is_ using that information,
and that tools like ext2ed and e2fsck understand this new metadata, and
that gnome people have root access to the box they're installing on and can
recompile the kernel for their desktop - Personally, I think this is going
too far for a desktop manager.  As part of a project to extend the capabilities
of linux or ext2, yes, that's a brilliant thing to be doing - but once
again, GNOME is _NOT_ linux specific, or ext2 specific - you're about to
embark on a lot of work for something integral to the GNOME system, that
is completely irrelevant to a reasonable portion of the intended GNOME
userbase.  It's outside the scope of GNOME.

> 
> > It also shares a problem with the next option - how do I assign metadata
> > to files over which I have read-only permissions?  'cause I'm gonna want
> 
> Each user will have to have their own 'preferences' database, even if my
> integrated filesystem approach is used.  So it's clear (I hope), my
> suggestion to integrate the data into the fs does not alleviate any of the
> other strains GNOME metadata storage people will face - it is simply an
> alternative to a global, instance-specific database.  We will still need a
> global mime.types for classes of information.  We will still need a
> userlevel preference database.  What we won't need is a global "registry" 
> like Windows/OS2.  What we will gain is the ability for the owner of a
> file to embed any data (s)he wishes without having to worry about other
> people mucking it up.  Example: An "Author" tag.  Sure, you can figure out
> the author by the UID - until /etc/passwd gets nuked (it happens). 
> However, you don't want just anybody to be able to override this tag and
> claim it as their own.  So not only do you need metadata, but you need
> metadata permissions for each tag.  If you don't store this in the file
> itself, it's a whole other level that you have to impose on top of a
> global registry - a whole new kernel protection scheme if you will, just
> for metadata.  In the file, the kernel will handle it.

Sorry, your linux kernel will handle it.  On a non-linux system, we can't
use any of that, because in most cases we can't recompile our kernel.

<snipe>
Have you started talking to the BSD guys about extending their filesystem
yet?  'cause I bet they'd have comments on it as well - it's something you'll
have a _real_ hard time selling...
</snipe>

Ok, so Author, we presume we can only have one of.  What happens when a
system of 200 people, all running the desktop off one server (or off one
fileserver, maybe), want to place the same attribute with a different
value on a particular file?

You're not talking about something 'new' for the kernel to handle if you have
a personal registry - all you're talking about is the ability for each
person to keep their own list of meta-attributes for any given files.  It's
something that's _completely_ userlevel - which is what it should be.

> > > *I* never suggested that a preload alone would suffice in all cases.  I
> > > explicitly said (several times) that an alternate would always be needed.
> > > Preloads simply make it harder to unintentionally orphan data, which I
> > > think is a Good Thing (tm).
> > 
> > The problem is, once you accept preloads, you start bringing embedded
> > metadata storage into the picture (under the assumption that everything
> > will be able to cope, because everything will use the library).  If everything
> > is _not_ using the library, then embedding information is dangerous -
> > and if that's the case, the preload scenario _doesn't_ gain you anything
> > _in terms of design_.  Once we've got a core library, then you could easily
> > produce a preload-capable wrapper, and preload to your hearts' content on
> > your own system - but if the design of the metainfo database assumes that
> > we can gain this sort of coverage, we've got problems.
> 
> I think you misunderstand.  LD_PRELOAD has nothing to do with fs
> integration.  It has only to do with data preservation.  I don't care if
> the data is in the fs or in a database - it can be as easily orphaned
> either way.
> 
> But as I said, yes, if you do embed the data and then lose it, it will
> show up as a "lost chain" in the filesystem.  This will be bad because it
> wastes disk space.  However, if this does happen and you fsck the drive,
> these chains will reappear in /lost+found no?  Then, imagine this: when
> GNOME boots, give it a "-recover-directory:" parameter where it will scan
> these lost chains and reattach them to their parent objects.  This can be
> done if a reference to the original object is kept with the EA data, and
> will be about as effective as Win95's "shortcut resolution".  That's about
> the best you can do once the data has been orphaned.  Another beauty thing
> of this is that since this data file is intact, you can move it as a
> single entity, and GNOME should be able to read it as one, and then
> re-integrate it into whatever database system is used, be it flatfile, fs
> integrated, or remote daemon.  How slick would that be?

No, no no :(  Anything that's orphaned in that way on an ext2 filesystem
is prone to be written over.  What you're suggesting is 1) not reliable,
2) completely linux/ext2 specific - if you really want to program OS/fs
combination awareness into GNOME for the number of fs'es that are out there,
then you're biting off _far_ too much work.

> 
> > *nod* We're in agreement here - the only thing that's keeping this running
> > is the reference to preload, which I think clouds the issue of database
> > design.  Consider preload as an added extra that might appear sometime
> > further down the track as a wrapper, and we're both happy :)
> 
> Sure - the two things have nothing to do with one another, except that a
> preload will help to ensure the consitency of the database from non-GNOME
> aware apps.

_no_, it will not.  preload is _not_ reliable.

> 
> > Almost - I think this is the next discussion - my preference is heavily
> 
> This should be short as long as you don't balk at me wanting to use spare
> data structs inside ext2.  :)

I _don't_ use ext2 in _most_ of my work.  Period.  The same applies for
quite a number of people.  Much in all as it would be nice for Linux to
take over the world, it hasn't happened yet - and again, I think the
BSD people will have something to say about it.

You're suggesting a whole swathe of work that not only isn't going to
benefit a lot of people, but is going to duplicate the functionality of
your 'fall-back' system, and is going to require a _much_ higher ability
to tinker with the system/knowledge of the system from the end user (ie.
now we have to recompile our kernel to run gnome).

> 
> > with a personal database + system database, rather than trying to get the
> 
> These two will always be needed.  In fact, more will be needed as there
> should be at least two system databases - mime.types, and then the
> file-specific database (which I want to be part of the fs, using the fs
> as the database, if you will).
> 
> > metadata 'near' the file - I _hate_ having .* files everywhere, and I
> > think where you put the data is actually near-irrelevant AFA how likely
> > it is to be orphaned - if you use the metainfo library, it won't be orphaned,
> 
> As do I hate .link-to files.  These make it even more unreliable in my
> opinion.  Easier to recover when things go wrong, no doubt, but very
> hokey.  If the db is abstracted, both can be used.  .files can be used
> when testing it (for easy recovery) then a flatfile can be used for
> speed/reliability once the core logic is debugged.
> 
> > I also think you dodge the issues of 'whose metadata is it', and 'can I
> > assign metadata to a file I can't write'.  The _big_ problem is, how do you
> > export that data to other systems?  Maybe we need a 'gnome-file' type, that
> > contains meta-info in a wrapper around the file itself... *shrug*
> 
> Getting the data to other system _is_ a problem.  My initial solution was
> to modify NFS to send the metadata stream to a client that requests it but
> nobody seemed to like that.  The only other solution that I see offhand is
> a GNOME attribute daemon to run alongside the NFS daemon (ala xfs).  (This
> daemon could be anything - from a custom app to a SQL server).  That's
> what I talked about when I spoke of a NFS server with hidden data.

Neither of these are nice.  Also, how do you handle moving files from
one filesystem to another - like dumping files to a MS-DOS disk?  The
transferral of stuff from person to person is going to be another sticking
point, and it's not something I want to think about right now :).

I have two questions I'd like answered re: extended attributes:

1) How do you deal with many people wanting different values for the same
   attribute on the same file?  (eg. different viewers for the same particular
   pictures).
2) How do you deal with trying to assign attributes to files you only have
   read access over?

Those two in themselves are tricky.

My proposal is still (for reference :) :

A system-wide, and a more specific person-wide database (both of the same
format), that can return methods for queries on files.  The database itself
should be, probably, a .db file, or integrated into Corba, or something
like that - it should be essentially a standalone database.  It should _not_
require anything from the filesystem itself, or from the OS, as much as
possible.

As a nice side-note, I was thinking about this this afternoon.  Imagine
being able to specify, in your database, things like the following:

any 'open' request for any '*.ini' file, use this SQL database.
   (We are now able to store all our config files in a database, accessable
    from anywhere that has gnome and our private database :)

any 'execute' request for any 'GNOME/BIN/*', execute from '/opt/gnome/bin/*'
   (Sysadmin, or personal user, can now install new versions of whatever,
    wherever, trial them and switch back easily, etc. etc.)

any 'view' request for any '/opt/web/http:*' file, use this browser
   (We can now create 'virtual' portions of our filesystem - I can
    browse through /opt/web/http:www.gnome.org with any gnome prog)

Imagine 'My Computer' done through this - no longer do we need any knowledge
   of /proc as a user, we can select 'my computer', select 'cpu', and presto,
   there's the info you're after.  This is doable with some simple remaps
   for view requests on '/proc/*'

I'm sure there's other nifty things we could do with this - I'd rather not
have to tie it down to having explicit files on the hard drive to make
this stuff happen.  I reckon we could create an entire object-oriented
view of the system, not necessarily tightly restricted to the filesystem,
but all interfaced in the same manner - consistency...

KevinL
Follow-Ups:
- Re: About metadata (long!)
  - From: Christopher Curtis
References:
- Re: [Summary] Meta-data/filesystem-encapsulation
  - From: Christopher Curtis
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]