Metadata / Object Systems



On Sat, 15 Aug 1998, Kevin Littlejohn wrote:

This was unintentionally sent to me privately, so I'm replying to the list
without editing. 

> Date: Sat, 15 Aug 1998 15:48:39 +1000
> From: Kevin Littlejohn <darius@connect.com.au>
> To: Christopher Curtis <ccurtis@ee.fit.edu>
> Subject: Re: [Summary] Meta-data/filesystem-encapsulation 
> 
> > > So is gnome a linux-specific thing?  Or more correctly, an ext2-linux
> > > specific thing?  Can we put attributes on NFS-mounted files?  On MS drives?
> > > Am I allowed to use the metadata stuff, given I'm on Solaris (half the time,
> > > anyway :)?
> > 
> > You deleted the relevant portions of my post where I said that this was
> > the optimal solution, but we would still need a relatively reliable method
> > for those non-Linux/ext2fs users.
> 
> Yeah, I did, sorry about that :(
> I'm wary of 'this is a neat solution, so let's implement this, and use this
> as a fall-back' where the fallback is a fully-fledged solution in it's
> own right - you're doubling the workload, or you're restricting people
> in non-linux, non-ext2, non-preload-across-everything/one, non-whatever
> is required for that particular solution to a sub-optimal 'fall-back' mode.
> I still think support for a wider range of OS/environment is more important
> than support for the last 5% of flexibility/power in a specific environment.
> I only say that because I think you _can_ build a solution that allows metadata
> for individual files (I'm counting on it, 'cause it's a feature I very much
> want :), and not hook into the filesystem - it just means a bit more attention
> to detail.

I'm not restricting anyone beyond what doesn't exist.  What I propose
_would_ double the workload, as you say, though, to give those in a
homogenous environment the option of a slick solution, instead of what
will have to be a set of (horrible? less-optimal?) hacks.  The problem is
that you're trying to store object information, on disk, without support
from the disk.  *This* is a major effort, with a lot of work, workarounds,
and drawbacks, whereas modifying the file systems makes things nice.
Things can't always be nice, though, so the "fallback" system (the one
that does not employ the filesystem natively) must alwaays exist.  Again,
looking at OS/2, HPFS partitions stored the data with the file.  FAT
partitions had to store it all in another file (EA DATA.SF, plus an
indexing file IIRC).  Needless to say, no OS/2 user who used EAs would
tolerate FAT as a primary storage method.  Not only was it much slower,
but also more error prone.

> > As far as NFS support, it actually falls into two categories as well:
> > Linux and non-Linux.  Linux would need a special NFS driver because files
> > written with this metadata would have it imbedded in the file - in front
> > of the real data if using the header method, or as a supplimentary block
> > if using the more wasteful method.  Linux NFS drivers should be able to
> 
> Yuk!  I'm sorry, but I _don't_ want my (occasional) desktop manager embedding
> things in my files - I think you're drilling right down from a top-level
> desktop 'helper' system, to tinkering with the filesystem, and I don't think
> it's a reasonable thing to do.  And how do I clean all this up if I install
> gnome, then decide I don't particularly like it?  And what does tripwire
> say about all this tinkering?  and so on...

In the case of ext2 (if possible) it would not be part of the file *data*.
Think kinda like NTFS - each file has several "streams" - one is "::DATA",
another could be "::ATTRIBUTES" (I don't do NT but NTFS grew from HPFS
(without the 'HP' - high performance, of course) so there are
similarities).  It appears there is extra space in the ext2 code for an
additional data 'stream' independent the 'DATA' stream.

Regarding 'cleaning it up' - you wouldn't, and you wouldn't need to.  Once
the object-specific data is set, one of two things will happen - it will
either be orphaned when you use a non-attribute aware program (mv, cp) or
the data wil be retained when an object system that can read the data
comes back.  This need not be limited to GNOME.  Think of the 'find'
utility -- "find . -type f -metadata Author='Chris* Curtis' -print".  In
order for any system to exist, these low-level utilities will have to
either be replaced with GNOME-binaries, or the GNU binaries be made
"Object Data"-aware.  This problem exists no matter which system you
chose.

tripwire, et al. may complain - I don't know how picky they are about
inode attributes (ctime, atime, utime).  Ideally this data would get stuck
in there so it would be invisible, but that depends on the filesystem's
implementation of object data.  I mentioned putting the data at the end of
a FAT chain only because OS/2 created this "EA DATA.SF" file on every
floppy so that object data could be preserved, and I hope there's a better
way.  Again, all this has to be considered on a per-filesytem basis, and
if it's not possible, the fallback method needs to be used.

> > recognize a client requesting data with the "EA bit" set and send those
> > initial block(s) before the file as though it were ext2 itself.  If it's
> > missing this "EA bit", it would not get the metatype data, and would
> > resort to the fallback (aka non-optimal) method or forcing the client
> > program to figure it out for itself.  I'm not sure if there is spare space
> > in the ext2 inode for an EA block/bitmap, but you would have to agree that
> > this would be the optimal method.
> 
> It's optimal only if you're presuming there's ext2 filesystems and/or linux
> in there somewhere - and I'm still telling you that's not the case.
> Gnome is an _X-Windows_ thing - X has been around long enough to be on many
> many systems.  Currently, gnome runs on most of those quite happily, and has
> pretty close to full functionality.  Keeping it that way is a worthy goal,
> I think.

I use ext2 only because it's easy to modify and we know the source.  Other
filesystems may have spare data structures where object data could be
stored as well.

And while yes, GNOME, is a X thinggy, it's also an OBJECT thing, and as I
said, you're trying to impose object properties onto a storage media not
designed for objects.

AIX deals with objects and classes at some level - anyone know how?

> > Regardless, the problem remains even with this implemented.  (Though,
> > other filesystems can support it as well - like FAT could allocate an
> > extra block at the end of the file, etc, though this might get a little
> > funky for some data files depending on how they are setup.  The again, if
> > GNOME knows about these files, it should be able to write the extra data
> > in a non-(or minimally)disturbing way.)
> 
> So now Gnome needs to have knowledge about different filesystems?  Anyone
> spot the break in the layering/OO philosophy here?

It's a matter of storing the data.  It can either use an error-prone
registry-style flatfile or other disassociative method, or take advantage
of the filesystem properly.  It *is* a lot of work, and maybe not
something for 1.0, but I think it's something that should either at least
be looked into, or planned for so as implementing it later doesn't become
a major chore. 

> > > user.  Make it rely on things that mamy not be available to some users (like
> > > ext2 filesystems, or the ability to preload libraries across everything),
> > 
> > Well, as I said, it *need* not be ext2 specific - it can do multiple
> > filesytems to an extent, and will need a fallback method for heterogenious
> > environs anyway.  However, I don't know of many systems without PRELOAD
> > availablity (not that that's really relevant without a proper fs that can
> > be safely directly manipulated).
> 
> Note, when you require preload ability, you really require it across anything
> that modifies the filesystem.  I'm sorry, but trying to convince my workplace
> that they should preload libraries for my graphical manager into our web servers
> and proxy servers and syslog daemons and so forth is just _not_ going to
> happen - yet, I'd like to be able to specify metadata for those daemons
> log files, and not have it go away every time the daemon rotates it's logs.

Well, you can't assign metadata to the log files without being root
anyway.  Unless the intent is for anyone to assign any metadata to any
file for their own personal use.  If that is the case (and it sounds
plausible) no fs modification will work fully anyway - it will always
require a user preferences "mime.types" style file/database.  Hmm.

> > > and you reduce the functionality of gnome for parts of the userbase.
> > 
> > I never said anything that would reduce functionality.
> 
> Yeah, you did - because you suggested that if we can't meet the requirements
> for the primary system, we should use the fallback system.  I'm suggesting
> the requirements for systems as you've described are too stringent, and we
> (being non-linux, non-trivial system users) are going to be faced with the
> fallback system, and reduced functionality - unless you put as much work into
> the fallback as you do the primary, in which case, why have the primary?

Why do you insist that the fallback method be less functional?  And why
have the primary?  It's much less error-prone.  Think of things like
thumbnails.  These would be better put with the file itself rather than in
either a huge centralized database or each users' personal databases.  If
you can store it with the file itself, great!  If not, you'll have to
stick it in the central db.  Now how long is it going to take for things
to load if they have to wade through a 500MB database holding thumbnails
for images that were all "rm -rf"d several months ago?

> > > I still think the most important part of this system is not going to be
> > > where do we store the data, or how do we access the data - it's going to
> > > be how do we handle the data being desynchronised - because without
> > > redoing the filesystem (and loosing vast swaths of the userbase), we _are_
> > > going to run into synch problems.  How well the system recovers from those
> > > problems will be, to a large degree, the measure by which users judge the
> > 
> > Absolutely.  I think you'll find that there is no perfectly acceptable
> > answer.  So, with that in mind, why not have at least one such method
> > even if it does apply only to a limited userbase?  I'm sure that as GNOME
> > spreads outside of Linuxens other will find that they have similar ways to
> > modify Linux's fs drivers to implement these EA on native systems without
> > totally trashing their local fs's.
> 
> I still think modifying the filesystem for something as glitzy as a desktop
> manager (and when push comes to shove, the desktop manager is really only
> polish, not something that should have hooks into the deepest levels of your
> OS) is fundamnetally flawed - it's _not_ a modular way of doing things, it
> pays no attention to keeping clean seperation of layers of functionality,
> and it's going to cause heartache at a later date.  Make something that
> functions more generically, and you'll win a lot more people.

I think you're thinking too limited.  This is more than a desktop without
a window manager.  This is user-specified file data.  mime.types is handy.
It is a usable "class" specification.  But it lacks granularity that can
only be done in the fs.

<speculation> I think you'll find Linux(/Hurd) breaking away from many of
the Unix'isms.  That's not to say that things are going to change
radically; Unix is a very solid foundation for sure, but in a world where
software is increasing referenced via "objects" and "methods" and "agents" 
and "componants" and "distributed (all those things)" Unix presents a bit
of a hinderance.  Just like Linux outperforms many of the older Unixes,
technology changes, and Linux/Unix must adapt to take advantage of it or
be left behind.  Maybe the time of the distributed/object/agent/blah
hasn't yet come, but you can be sure its arrival is inevitable.  OS/2
failed, NeXT failed, all the other Object OSes failed, but MS is pushing
into this area like a glacier and it will come, sooner or later.  It will
be good to be prepared.
</speculation>

> KevinL
> > 
> > --
> > Christopher Curtis               - http://www.ee.fit.edu/users/ccurtis
> >                                  - System Administrator, Programmer
> > Melbourne, Florida  USA          - http://www.lp.org/
> 
> Hey, I didn't notice this before - I'm in Melbourne, Australia :)
> 
> I'm about to head out again, but when I come back, I'm going to try and put
> a proposal into words - it's a rehash of something someone else just posted,
> so I can't exactly claim to have any 'brilliant new ideas' on it at the moment.
> But I figure given I'm picking holes in other suggestions, I should at least
> field my own - and I _think_ I'm finally wrapping my head around an idea that
> might work reasonably well.
> 
> KevinL

(later, that same day...)

On Sat, 15 Aug 1998, Kevin Littlejohn wrote:

> Okay, I've been thinking about this stuff for most of the day, and the more
> I think about it, the nastier it gets :)

exactly.  We're out of spec here...

> The first is, rather than full-pelt jumping in and designing the metadata
> system, what about looking at the interface required for the system?  What
> I'm thinking is, assuming we're going to have a 'libgfdb.so' (gfdb =
> gnome file database), what about defining the operations that library is
> going to have to satisfy - that way, we can look at getting _something_
> in production, without getting bogged down on exactly how it all works.
> We make the assumption that, at least for gnome programs, we can link
> this library in and use things like 'gfdb_add(inode,method,action)',
> 'gfdb_query(inode,method)', etc.  That way, gnome applications can start
> to work with this, and we still leave the door open for pretty much any
> implementation we like at a later date (including preloads, if that's
> deemed necessary).

Hah!  Now you're coming around.  =)

That is exactly what I said way back when - GNOME will have to abstract
fread() and fwrite().  Putting this is a PRELOAD simply makes it work for
all programs whether they were written for GNOME or not.

> Now, my suggestion for the underlying databasing:
> 
> What about a .db file (or similar) per user, stored in the user's homedir
> somewhere, that refers to files by _either_ hash or regexp of filename,
> choosable by the user when the association is made?  This way, if you want
> to associate a bunch of log files with an application (or even one log
> file), you can specify it by filename ('cause logfile _names_ don't generally
> change, but theyir content does), but if you want to specify that 'huge.png'
> gets opened with a different app to all other '.png' files, you can specify
> it by hash - and it'll hold no matter whether you rename it or move it
> or whatever.

This isn't really true.  Most logfiles are rotated weekly, if not daily or
even more often in the case of heavily-loaded web servers.  And again,
what about thumbnails?  Do you want every user to create their own
thumbnail?  And how about textfile metadata such as "Author"?

As to user settings on individual files (huge.png, in this example) there
will still need to be some sort of database fo

(many hours pass as I give up trying to reconnect)

... some sort of database for storing personal attributes.  Don't know
what I aws going to say, but the standard Berkeley flat file database
should suffice as there are already tools to manipulate them from the
commandline.  Or it can be a text file.  Not sure what I was getting to
here.  Again, without a LD_PRELOAD, 'mv' & co will orphan this data.

> Obviously, there are some issues to be addressed with that - how do you
> know which match is more exact, that sort of thing - but they _are_
> addressable, I think.  The synchronisation problems are still there
> (as they are with anything), but by allowing the user to pick the method,
> we can gain a bit, and by defining a gfdb library, we can then write some
> wrapper programs (gfdbrm, gfdbmv gfdbcp - you get the idea) that simply
> call the appropriate library routines - those little programs are then
> easy to integrate into scripts, and easy to port from machine to machine
> and environment to environment - _and_, it give users the chance to
> customise their other desktop environments and programs to use the
> same system - which makes it a lot easier to integrate this system into
> anything else you're doing, and thereby helps dodge the desynch problem.

As for writing scripts, yes, this would help for portability, but under
any scheme they would be required.

> Resynchronisation of the database is a nasty question, unfortunately -
> if a file has been deleted, and the meta-info is still there, we need
> to be able to clean it out at some stage.  The only way I can see,
> unfortunately, is to run a cron job periodically that scans across the
> whole drive system for each entry in each users database - messy, ugly,

This shouldn't be *that* bad, but admittedly not ideal.  The ideal, of
course, would be to store the data with the file itself.  I may be
repetetive here, it's been several hours since I wrote the first response
to this message.

> bad :(  Alternatively, we provide a really nice GUI for 'association management'
> - I think that's almost a requirement of any of the systems, if people
> want to tinker with their associations, we should give them a really
> neat tool to do that with.  Luckily, if we make the library and a bunch
> of wrapper programs, we can even tinker with it from command line,
> which I definately like :)
> 
> So, does any of the above make sense?  I guess my big suggestion is
> that we seperate the actual databasing from the interface for databasing
> a little bit, so that we can define the interface, and let gnome programs
> start to use this interface, while we thresh out the underlying
> question of 'which databasing system is superior'...

Precisely either what I already said or what I said above...

> Hell, then we could even swap and change our databasing system to suit
> ourselves, if it's done properly...

That's the beauty of abstraction, even if it's integrated into the fs.

rgds,
--
Christopher Curtis               - http://www.ee.fit.edu/users/ccurtis
                                 - System Administrator, Programmer
Melbourne, Florida  USA          - http://www.lp.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]