the key_depends() issue



A lot of discussion has happened around
https://bugzilla.gnome.org/show_bug.cgi?id=640667 and I would like to
weigh in my point of view, to see where people agree/disagree, and get
the API change done. For that matter, I have written a small document
trying to specify the problem clearly and completely enough, and then
explain my idea, which is basically what Juan and Iago proposed, with
some little changes. Basically, I think the biggest/most controversial
one is the idea of getting rid of supported_keys(). I think the other
points have been discussed.

Here it is, in text form so that it's easy to reply to, and attached in
HTML for easier reading. Note that in the html version, section 3.2, the
higher level enumerated list should be alphabetical (couldn't be
bothered fixing that).

Thanks in advance for your comments

Guij

------8<--------
===========
Present API
===========

::

  const GList * (*supported_keys) (GrlMetadataSource *source);
  const GList * (*key_depends) (GrlMetadataSource *source, GrlKeyID key_id);


There is also a ->slow_keys() with the same prototype as
->supported_keys(), but
the speed issue (fast resolution against slow resolution) is considered
out of
scope here.


=======================
Issues we want to solve
=======================

 (1) avoid calling $PLUGIN.resolve() if we can know in advance $PLUGIN
will not
     be able to find that type of metadata

 (2) handle conditional abilities: a source may be able to provide a given
     metadata only if a given condition (other than the presence of another
     metadata key) is true. For instance:

     - a cover art source can only provide the THUMBNAIL key for a
GrlMediaAudio
     - a provider using an API on the local file system (such as
       local-metadata) can only provide metadata for a GrlMedia that is
on the
       local file system (url in file://)

     Also, dependencies (see (3)) can be conditional.

 (3) handle dependencies: suppose the following hypotheses:

      - we want to know metadata A for a media M of type T
      - source S1 can provide metadata A if metadata B is present
      - source S2 can provide metadata B
      - M already contains the following metadata: C, D, E

     then the core should be able to know all this information through
the API,
     preferably before any call to resolve(), so that it can call S2
asking for
     B first, and then call S1 asking for A.

The current API helps solving (1) with ->supported_keys() and (3) with
->key_depends(), but does not allow (2) to be solved.


========
Analysis
========

Let's try to  formalise things a bit to be able to identify the
different case
that might happen.

Parameters of a metadata resolution
-----------------------------------

(that is, data we have when grl_metadata_source_resolve() is called)

:M: a GrlMedia object, containing the following information:

  :M.T:  the type of M (Box, Video, Audio or Image)
  :M.D:  dictionary of key->values of metadata on M

:K:  key we want to resolve

Only one key to resolve, and not a list is considered here. For a list, one
should be able to just repeat the algorithm for each member of the list. Not
sure it is optimal in speed, but it should guarantee to get the result
in all
cases where it is possible.

We should note that core's resolution algorithm can potentially expand M.D
(through calls to other sources' resolve()), meaning we can be inclined
to try
the query with that source again with an expanded M.D.
M.T and K are fixed.


.. note:: We suppose we are in the case with GRL_RESOLVE_FULL flag is set,
             else we just want a call to the source's ->resolve().



Considering the above, the ability of S to resolve K can depend on:
 - K
 - M.T
 - the set of keys available in M.D
 - the value of certain keys in M.D
 - external conditions that cannot be known without blocking (such as data
   available on a remote server)
 - any combination of the above

The present API only takes K into account. An optimal solution would take K,
M.T and M.D into account, and provide a way to resolve issues through more
resolution (filling M.D)

Algorithm Idea
--------------

For source S, given the data above (M and K), here is how I see the
resolution
algorithm:

Is resolution possible?
 (a) YES: let's call S->resolve() (or plan to)
 (b) NO: could it be possible with more keys available in M.D?

     (1) NO: give up
     (2) YES: what data should be added in M.D? -> a set of keys L need
to be
         filled in M.D. Try to resolve these before resolving K

We are necessarily in either one of the cases (a), (b1) or (b2).
In cases (a) and (b1), we don't need more information than the fact we
are in
said case.
In case (b2), we need S to give us a list of keys L.
It should be noted that if we are in case (b2), after obtaining the needed
metadata (supposing we can), we can still be either in case (a) or (b1),
since
the ability to resolve can depend on the _value_ of a given metadata,
and not
only on its presence.
Likewise, if we are in case (a), the call to resolve() might still fail
(dependency on external conditions).



=================
Proposed solution
=================

 - remove supported_keys() and key_depends()
 - add::

     gboolean (*may_resolve) (GrlMetadataSource *source,
                  GrlMedia *media,
                  GrlKeyID key_id,
                  GList **missing_keys)

How this solves our issues
--------------------------

 (1) if may_resolve() return FALSE, we know in advance $PLUGIN can't
find that
     metadata, we won't call resolve() for nothing.
 (2) both media and needed key are passed as parameters, therefore
     may_resolve() can have its result depend on these.
 (3) if missing_keys is not NULL, a list of keys on which this resolution
     would depend is provided there.

Representation of different cases
---------------------------------
::

 (a)   <=> may_resolve() returns TRUE
 (b.1) <=> may_resolve() returns FALSE and *missing_keys == NULL after
the call
 (b.2) <=> may_resolve() returns FALSE and *missing_keys is a non-empty
list of
            GrlKeyID.

Why no supported_keys()?
------------------------

Not needed
~~~~~~~~~~

supported_keys() is not strictly needed in a resolution algorithm, and could
only provide a marginal performance improvement if its result is cached.

Rationale:
The resolution algorithm, with supported_keys(), would do something like::

   for each source:
     key_list = supported_keys()
     if key_id in key_list:
       possible = may_resolve()
       if possible: resolve()
       else:
         /* try to resolve missing_keys first * /

The alternative using only may_resolve() would do::

   for each source:
     possible = may_resolve()
     if possible: resolve()
     else:
       /* try to resolve missing_keys first * /

In the case where key_id is not in key_list, a clever enough may_resolve()
implementation would return FALSE as fast as the first solution will
take to call
supported_keys() and check for key_id in it.
In the case where key_id is in key_list, the first implementation will
cost one
function call more.


If we nevertheless really want supported_keys() (i.e. we think this thing is
worth optimising by caching these results into a dictionary key->list of
sources that support it), an alternative could be to pass NULL as media and
missing_keys to may_resolve(), but I feel this would make the semantic of
may_resolve() too complex, and that in that case we should stick with
supported_keys().

Less code for plugin
~~~~~~~~~~~~~~~~~~~~

That would be one less method to implement, knowing that what it does, or
something very similar, has to be done in may_resolve() anyway.

1   Present API

const GList * (*supported_keys) (GrlMetadataSource *source);
const GList * (*key_depends) (GrlMetadataSource *source, GrlKeyID key_id);

There is also a ->slow_keys() with the same prototype as ->supported_keys(), but the speed issue (fast resolution against slow resolution) is considered out of scope here.

2   Issues we want to solve

  1. avoid calling $PLUGIN.resolve() if we can know in advance $PLUGIN will not be able to find that type of metadata

  2. handle conditional abilities: a source may be able to provide a given metadata only if a given condition (other than the presence of another metadata key) is true. For instance:

    • a cover art source can only provide the THUMBNAIL key for a GrlMediaAudio
    • a provider using an API on the local file system (such as local-metadata) can only provide metadata for a GrlMedia that is on the local file system (url in file://)

    Also, dependencies (see (3)) can be conditional.

  3. handle dependencies: suppose the following hypotheses:

    • we want to know metadata A for a media M of type T
    • source S1 can provide metadata A if metadata B is present
    • source S2 can provide metadata B
    • M already contains the following metadata: C, D, E

    then the core should be able to know all this information through the API, preferably before any call to resolve(), so that it can call S2 asking for B first, and then call S1 asking for A.

The current API helps solving (1) with ->supported_keys() and (3) with ->key_depends(), but does not allow (2) to be solved.

3   Analysis

Let's try to formalise things a bit to be able to identify the different case that might happen.

3.1   Parameters of a metadata resolution

(that is, data we have when grl_metadata_source_resolve() is called)

M:

a GrlMedia object, containing the following information:

M.T:the type of M (Box, Video, Audio or Image)
M.D:dictionary of key->values of metadata on M
K:

key we want to resolve

Only one key to resolve, and not a list is considered here. For a list, one should be able to just repeat the algorithm for each member of the list. Not sure it is optimal in speed, but it should guarantee to get the result in all cases where it is possible.

We should note that core's resolution algorithm can potentially expand M.D (through calls to other sources' resolve()), meaning we can be inclined to try the query with that source again with an expanded M.D. M.T and K are fixed.

Note

We suppose we are in the case with GRL_RESOLVE_FULL flag is set, else we just want a call to the source's ->resolve().

Considering the above, the ability of S to resolve K can depend on:
  • K
  • M.T
  • the set of keys available in M.D
  • the value of certain keys in M.D
  • external conditions that cannot be known without blocking (such as data available on a remote server)
  • any combination of the above

The present API only takes K into account. An optimal solution would take K, M.T and M.D into account, and provide a way to resolve issues through more resolution (filling M.D)

3.2   Algorithm Idea

For source S, given the data above (M and K), here is how I see the resolution algorithm:

Is resolution possible?
  1. YES: let's call S->resolve() (or plan to)
  2. NO: could it be possible with more keys available in M.D?
    1. NO: give up
    2. YES: what data should be added in M.D? -> a set of keys L need to be filled in M.D. Try to resolve these before resolving K

We are necessarily in either one of the cases (a), (b1) or (b2). In cases (a) and (b1), we don't need more information than the fact we are in said case. In case (b2), we need S to give us a list of keys L. It should be noted that if we are in case (b2), after obtaining the needed metadata (supposing we can), we can still be either in case (a) or (b1), since the ability to resolve can depend on the _value_ of a given metadata, and not only on its presence. Likewise, if we are in case (a), the call to resolve() might still fail (dependency on external conditions).

4   Proposed solution

  • remove supported_keys() and key_depends()

  • add:

    gboolean (*may_resolve) (GrlMetadataSource *source,
                             GrlMedia *media,
                             GrlKeyID key_id,
                             GList **missing_keys)
    

4.1   How this solves our issues

  1. if may_resolve() return FALSE, we know in advance $PLUGIN can't find that metadata, we won't call resolve() for nothing.
  2. both media and needed key are passed as parameters, therefore may_resolve() can have its result depend on these.
  3. if missing_keys is not NULL, a list of keys on which this resolution would depend is provided there.

4.2   Representation of different cases

(a)   <=> may_resolve() returns TRUE
(b.1) <=> may_resolve() returns FALSE and *missing_keys == NULL after the call
(b.2) <=> may_resolve() returns FALSE and *missing_keys is a non-empty list of
           GrlKeyID.

4.3   Why no supported_keys()?

4.3.1   Not needed

supported_keys() is not strictly needed in a resolution algorithm, and could only provide a marginal performance improvement if its result is cached.

Rationale: The resolution algorithm, with supported_keys(), would do something like:

for each source:
  key_list = supported_keys()
  if key_id in key_list:
    possible = may_resolve()
    if possible: resolve()
    else:
      /* try to resolve missing_keys first * /

The alternative using only may_resolve() would do:

for each source:
  possible = may_resolve()
  if possible: resolve()
  else:
    /* try to resolve missing_keys first * /

In the case where key_id is not in key_list, a clever enough may_resolve() implementation would return FALSE as fast as the first solution will take to call supported_keys() and check for key_id in it. In the case where key_id is in key_list, the first implementation will cost one function call more.

If we nevertheless really want supported_keys() (i.e. we think this thing is worth optimising by caching these results into a dictionary key->list of sources that support it), an alternative could be to pass NULL as media and missing_keys to may_resolve(), but I feel this would make the semantic of may_resolve() too complex, and that in that case we should stick with supported_keys().

4.3.2   Less code for plugin

That would be one less method to implement, knowing that what it does, or something very similar, has to be done in may_resolve() anyway.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]