Re: About metadata (long!)

On Wed, Aug 19, 1998 at 02:06:51PM -0400, Christopher Curtis wrote:
> > > Primarily non-mutable data.  This is stuff such as Copyright, which the
> > > owner wants to assign to a file, and give a "children of this file must
> > > inherit" flag (think like storing the GPL here).  Also, Author, presuming
> > > a file has only one Author - there would be a flag perhaps to reference
> > > this attribute in all children (file copies).  
> > 
> > Perhaps it's just me, but I don't see this as metadata, it's just
> > data. If you have an image, then the size of the image, or the author,
> > or whatever informaiton needs to stay with that image file should be
> > part of the file itself. Imagine having an image format which didn't
> > have it's size stored anywhere inside it (impossible). 
> Image size is already stored.  The point is you should be able to assign
> any attribute to any file.  Be it copyright, author, thumbnail,
> derivation, commentary, etc.  Imagine you have an image and you want to
> remind yourself how you created it (what font the text is in, what effect
> were used with what patterns, anyway...) - you'll either need to write it
> down on a apiece of paper or find a file format that supports annotation?

Really? I don't think about it that way at all. The way I think about
it, we have all this data in files all over which is 'trapped' inside
the file. Perhaps it's something as direct as the 'size' of the image,
perhaps it's something much more indirect like 'who the file is a
picture of'. (in my mind) the job of Metadata is to provide a
standardized way to publish this data to the world. In fact, in some
cases it might make sense to have the metadata never 'exist' at all,
but always be derived from the real file format by the appropriate
piece of code.

As an example. Currently, it is a valid idea to 'list all the images
on my system, sorted by image size'. However, there is no practical
way to do this. With metadata, the 'size' of an image becomes a piece
of metadata about any image file for which size makes sense. Then,
whether you have some daemon go through and extract the size and stick
it into a metadata attribute, or whether you just have some smart
piece of code which always runs whenever you ask for the "size" of the
image, is irrelevant. The important part is that the metadata
(i.e. the data about the data) is available, in this case the 'image

> > Metadata can be used as a standardized way to pull this data out of
> > the specific file format and publish it to the world in a standard
> Some metadata may be 'standard' in that it can be 'expected', but I
> wouldn't rely on any metadata being present in a file.  If anyone has ever
> used OS/2, please use this as a base reference.  I'm not familiar with
> NeXT.  

NeXT did not have 'metadata'. NeXT wrappers were for storing required
application resources. They were stored in a way which only the
application could understand (for the most part). They were "data",
not "metadata".

I have used OS/2, and I understand exactly what you are talking
about. That's where I formed my previous statement that "metadata
should be derived from real data". Let me clarify. When I said that
metadata should be "standard", I didn't mean that the file would ship
with the metadata. What I meant is that if you have five different
image formats you would extract the "image_width" "image_height" and
"image_thumbnail" in a standard way for all the image types, so that a
viewer would only need to understand one set of metadata.

> I would think that if you are joining an argument over what does
> and does not comprise an Object Model Environment (Windoze don't count),
> you (plural - not *you* specifically) would have some experience in one
> that already exists.

I agree. Perhaps we should just have everyone list the environments
they've actually spent noticable time using and/or developing for at
the top of their email. :)

My list:
                                   short list of software developed

NeXT: user/developer '91-'96      educational software, Sybase stuff
OS/2: user/developer '92-'96      BBS software
UNIX: user/developer '87-present  lots of stuff (drivers, s3mod, cgi, etc)
Win: user/developer '93-present  directdraw game
Mac: user/developer '93-'96      opendoc app for math education
Self: tinkered       '96          developed small scroll widget

NeXT DO : written collaborative software with it
CORBA: never used
ILU: never used

> > Thumbnails (IMO) are a perfect example of something which would be
> > well stored in metadata. It's recreatable from the original file, but
> > useful on it's own on a given system. It's a public export, in a
> > standard format, of information "about" the data (i.e. meta-data).
> Anything that annotates the file is meta-data.  Author, references,
> copyright, etc.  And remember that there are other issues that should be
> addressed as well (socio-politik (copyright), expandability, and meta-data
> attributes, I think, being especially important).

However, in the computing world as it exists today, we can not trust
that there is a place for 'metadata' to remain on non-metadata aware
systems. So people will continue to do what they've done for years,
put it into the fileformat. I argue that this is fine. As people
talking about a metadata system in a non-metadata world, we should
admit that whatever is a 'required' or 'expected' part of the file
should exist within the fileformat itself, so that it won't be
lost. The problem is that every file format will put the author inside
itself in a different way. Metadata on files is one way to have all
files exprot 'author' information in a standardized way. 

Just for reference, another way this would be possible, is if all
files were SGML. If all files were SGML, then we could have standard
'metadata tags' just like HTML has it's META tags. Then we could look
at the 'author' of any file by looking for the standardized tags. I
just don't think everyone is going to start using SGML as their
fileformat any time soon. :)

David Jeske (N9LCA) + +

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]