Re: About metadata (long!)



On Mon, 24 Aug 1998, David Jeske wrote:

[This discussion is being taken away from the GNOME list...]

> I don't think you understood my point. 'resolution' is 'data about an
> image' whether it's stored in some 'metadata attribute' or not. I
[...]
> I should be able to ask for the 'size' of an image through a metadat
> API. If it's retrieved from a 'metadata attribute' on the file, that's
> fine. If it's retrieved by running a filehandler for that type of
[...]

I wrote:
> > Absolutely.  Something that should never happen, though, is that GNOME has
> > to read the file format itself to find the metadata.  Want to see your
> I disagree completely. If we take the use of metadata to the ultimate
> extreme, then you might imagine a day where nearly all the information
> in a fileformat is publishable via some standard metadata system. It
> just dosn't make much sense to store a duplicate of that data in both

I agree with all of this, my concern is only performance.  This is really
a moot issue with a non-embedded metadata system though.

> I was trying to make the point that 'metadata' exists whether we
> decide to create some way to store it in a known way or not. Thus, I
> think it would be more powerful if the metadata api we decide on is
> aware of this, and hides the source of the metadata. 

At the API level, the source will always be hidden.  That's why we API.
I was simply trying to reason that it would generally be faster to
pre-extract the metadata, rather than going through the whole process of
opening a file.  If we can skip that step, we have a perf. edge.

> > I'm not saying it's not, but metadata *inside* the file should not be used
> > within GNOME.  Instead, a utility to extract that data and create
> > "appropriate" metadata entries should be used.
> 
> I agree that GNOME should access it through the metadata API. However,
> I think the difference between having the metadata system run code to
> extract the metadata 'on the fly' and the kind of 'pre-processing' you
> are talking about where a utility would create the appropriate
> metadata entries is an optimization issue which dosn't have a simple
> 'this is better' answer. 

I agree.  The proposal I made to Tom regarding the API was a "certainty"
option.  The only way to tell with absolute certainty is to examine its
contents.  Invalid attributes could easily be assigned to any file (or, I
should say, inappropriate) - GIF files could be called "goober.fun" and
you'd never know.  This is where a preprocess could help.  But, GIGO, your
metadata is only as valuable as you make it.  It'll never be 'foolproof'
(*where* the data is stored issues regardless).

However, as to 'this is better' - I say storage is cheap, and cycles are
always at a premium, especially when you're dealing with 'responsiveness'.

> For example, if the metadata is going to change every time the file
> contents changes, and the data is easily extractable from the file, it
> might make sense never to 'store a metadata entry' but instead always
> run some code which will pull the metadata from the file format.

Or to store a 'regenerate-on-write' metadata attribute, kernel providing.

> I think there (at least) are three distinct points where one might
> want to deal with metadata:
[...snipped...]

> This demonstrates a convergence between 'filetype handling' and
> 'metadata storage' when the metadata is derivable from the file data. 

There is a duplication of efforts, so to speak, but merely by consequence. 
There is no "metadata system" existing today, and by no means a standard
one.  The two are inextricably linked simply because of their lack of
availability (that is, there is no 'metadata' standard, so it must be
duplicated in the file itself).  This is unfortunate but permanent.  A
robust metadata system will only be complimentary, always be susceptible
to loss, never fully reliable, but hopefully useful.  It will not be
useful, imo, if too much power is needed to simply look it up.  This is
not a fully proper example, but think of the difference between "find" and
"whereis", and maybe you'll see my point a little clearer. 

regards,
Christopher



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]