Re: About metadata (long!)



On Wed, Aug 19, 1998 at 03:56:55PM -0400, Christopher Curtis wrote:
> On Wed, 19 Aug 1998, David Jeske wrote:
> 
> > Really? I don't think about it that way at all. The way I think about
> > it, we have all this data in files all over which is 'trapped' inside
> > the file. Perhaps it's something as direct as the 'size' of the image,
> 
> Okay ... you don't mean "size" here, but "resolution".  Yes, it would be
> handy to have a utility that can extract relevant attributes about a file
> (such as the Author field in a Word doc, Image Resolution from a GIF/JPG,
> etc.) and store that as metadata.

I don't think you understood my point. 'resolution' is 'data about an
image' whether it's stored in some 'metadata attribute' or not. I
don't think that querying for metadata on an item should be limited to
'retrieving pre-defined fields which are stored with the file'.

I should be able to ask for the 'size' of an image through a metadat
API. If it's retrieved from a 'metadata attribute' on the file, that's
fine. If it's retrieved by running a filehandler for that type of
file, that's also fine. I don't think one should have to run a program
over all my files which pulls out metadata and stores it in the
'metadata attribute'. I think the concept of asking for 'metadata'
(i.e. data about data) should be opaque to where that information
comes from.

> > perhaps it's something much more indirect like 'who the file is a
> > picture of'. (in my mind) the job of Metadata is to provide a
> > standardized way to publish this data to the world. In fact, in some
> 
> Absolutely.  Something that should never happen, though, is that GNOME has
> to read the file format itself to find the metadata.  Want to see your
> computer virtually stop?  Run GNOME as root...

I disagree completely. If we take the use of metadata to the ultimate
extreme, then you might imagine a day where nearly all the information
in a fileformat is publishable via some standard metadata system. It
just dosn't make much sense to store a duplicate of that data in both
a 'metadata attribute' and in the file itself. Or at least it dosn't
make sense for this to be a requirement. 

As a concrete example, a file open dialog might choose to show the
resolution of an image, or a thumbnail, etc. Instead of having to run
some program over the whole drive to convert that information into
metadata attributes, it would be nice if the open-file dialog code
would just ask for the information from the file, not caring where it
came from. If the OS had to run a piece of code and interpret part of
the fileformat to get the metadata, so be it. If the user wants things
to run faster, then appropriate metadata can be cached in a metadata
attribute which is stored with the file, instead of recreated every
time.

> > cases it might make sense to have the metadata never 'exist' at all,
> > but always be derived from the real file format by the appropriate
> > piece of code.
> 
> If that is possible, fine.  I don't see what this discussion is about...

I was trying to make the point that 'metadata' exists whether we
decide to create some way to store it in a known way or not. Thus, I
think it would be more powerful if the metadata api we decide on is
aware of this, and hides the source of the metadata. 

> > However, in the computing world as it exists today, we can not trust
> > that there is a place for 'metadata' to remain on non-metadata aware
> > systems. So people will continue to do what they've done for years,
> > put it into the fileformat. I argue that this is fine. As people
> 
> I'm not saying it's not, but metadata *inside* the file should not be used
> within GNOME.  Instead, a utility to extract that data and create
> "appropriate" metadata entries should be used.

I agree that GNOME should access it through the metadata API. However,
I think the difference between having the metadata system run code to
extract the metadata 'on the fly' and the kind of 'pre-processing' you
are talking about where a utility would create the appropriate
metadata entries is an optimization issue which dosn't have a simple
'this is better' answer. 

For example, if the metadata is going to change every time the file
contents changes, and the data is easily extractable from the file, it
might make sense never to 'store a metadata entry' but instead always
run some code which will pull the metadata from the file format.

I think there (at least) are three distinct points where one might
want to deal with metadata:

 1) when the file is written, data could be extracted and stored in metadata
    entries. This is likely useful for metadata which can take some time
    to create. For example, the number of words in a file, or the 'most 
    frequently used color' of an image file
 2) when a relationship is made between this file and some other information
    either by a program or by the user. For example, if an HTML file is
    saved, the source URL could be stored as a metadata attribute after
    the files is created and written.
 3) when metadata is needed from a file, data could be extracted and given 
    to the requester. The best example is images, in most image formats,
    it's trivial to extract the image dimension from the file, and it seems
    arbitrary and unnecessary to require that the file be processed elsewhere
    in order to create a metadata entry. If you ask for the metadata entry
    'image_resolution' the system is capable of just running the appropriate
    handler and getting the data, it should do so.

These are all examples which bring to light that:
  a) the 'metadata exists anyhow', wherever it's stored.
  b) the 'metadata storage' concept is just an optimization technique
     to allow us not to have to run handlers every time we want simple
     metadata.

This demonstrates a convergence between 'filetype handling' and
'metadata storage' when the metadata is derivable from the file data. 

-- 
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske@chat.net



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]