Re: Grilo advanced filtering



On Tue, 2012-07-24 at 10:37 +0200, Jens Georg wrote:
> As we discussed before, the current filtering is a "narrowing down" or
> a
> logical "AND" of a selected set of filters. 
> How open would you be to a more broad solution of the filtering issue,
> thinking of the tracker or similar future plug-ins?


I would like to drop here some information that can be useful for
further discussion.


When we think in a new feature to be added to Grilo, and specially those
features that must be implemented by the plugins at the end, we always
try to bear in mind the following:

- Feature must be easy to use by application developers. If it is hard
to use, nobody will use it.
- Feature must be easy to implement by plugin developers. If it is hard
to implement, then no plugin will implement it, so it will be totally
useless.
- Plugins usually are a wrapper around a service (YouTube, Jamendo, so
forth), so if the feature can't be implemented in an efficient way using
the service, plugin just will declare it doesn't implement such feature.


As it can be seen, we don't only think on features that are useful for
application developers, but also think on pluging developers and trying
to avoid making their life too much complicated.

So taking in account the above premises, and using the filtering as a
kind of feature to implement (it applies for other features too), we
need:

- Have a way for application developers to define the filter they need.
For instance, "only interested in music >= 1960".

- Have a way for plugin developers to define which type of filter the
source supports. For instance, source can filter by publication date.

- Intersect both needs. That is, developers must have a way to check the
source capabilities and see if the filter they need to implement fits
the capabilities of the source.

Giving all the information above, when we first discussed the idea of
adding filtering capabilities we tried to keep the things simple to
fulfil the above requirements.

You can see the main thread here:
https://mail.gnome.org/archives/grilo-list/2011-January/msg00134.html

In that thread, and probably in the IRC channel, it was commented which
kind of filters to support, and mainly, the type of operators.

Thus, we have the AND, the OR and the NOT.

If you take a look at the main webservices we implement through the
plugins, usually the provided API is RESTful, like:

http://service.com?v1=a1&v2=a2&v3=a3

As you can see, that can be read as v1=a1 AND v2=a2 AND v3=a3. That
means that almost all services implements the AND operator, but not the
OR operator.

NOT is a different issue. You can think on "NOT(v1=a1)" as "v1!=a2", and
implement something like
grl_operation_options_set_key_distinct_filter(). And it would fit with
all architecture we have so far around operation options. 

Coming back to the point of AND and OR operators, there are another
issue to solve: the combination of the operators.

So far, we had considered simple combinations. Thus we have

filter1 AND filter2 AND .... AND filtern

We could think on:

filter1 OR filter2 OR .... OR filtern

But what about:

(filter1 AND filter2) OR (filter3) OR (filter4 AND filter5)

Or:

(filter1) AND (filter2 OR filter3) AND (filter4)

Or:

(filter1 AND (filter2 OR filter3)) OR (filter4) OR (filter5 AND filter6)


As you can notice lots of combinations. And I guess not all combinations
can be handled by the plugins. So it makes it really complicated for
plugins to define what are the supported combinations.

So it is clear for me that if going through this path we need to
simplify the combination that we support. 

Following your email, and looking at the proposals for implementing
this, it seems to me that would be enough having this combination

F(1) OR F(2) OR ..... OR F(n)

where each F(x) = f(x1) AND f(x2) AND .... AND f(xn)


Still, I fail to see the use cases for the OR operator, specially as
said most of services just supports the AND case: "give me the videos
that has the word 'car' and belongs to user 'user_acme' and have HD
quality".

And for those (few, I guess) cases that you need to use the OR, you can
always implement it as an utility function (either as part of Grilo or
as separated library) using what we have in the core.

  void search_compose_filters (source, text, F1, F2, F3)
  {
    found_ids = NULL;
    foreach (filter in [F1, F2, F3]) {
      elements = search (source, text, filter);
      new_elements = get_elements_not_in_set (elements, found_ids);
      send_elements (new_elements);
      found_ids += get_ids (new_elements);
    }
  }

Of course, it is not efficient at all, but resembles how users use the
services when searching for content: first they search with one
condition, then search with a different condition.

Anyway, in the case we go with supporting the OR natively, this is my
opinion about each proposed approach:


> a) Simple: Instead of passing just one GrlOperationOptions, pass a
> list
> of options. Filters inside the options would be still "AND", the list
> would be considered a large "OR". 
> That of course implies that the result slicing is moved out of
> GrlOperationOption, or, as a "quick hack/contract", is only obeyed in
> the first list entry. 
> It has the benefit that it would be using existing stuff, not delay
> 0.2
> too much and if I'm not completely mistaken quite powerful already.

The problem is that each operation option contains not only the filter,
but also other options like flag, skip, count, etc.

Granted, we can use the first element to get those values. But what
happen if in the future we add new options that are also formed by
several sub-options? I will use here the sort capability example. If in
the future we want support sorting by different keys, we would need also
to add a list of different options, each one defining a sorting key
(like sort by "Title", then by "Artist").

Summing up, if a specific option is formed by different values, I prefer
to add API for that specific option and leave the remaining single
options with a simple API.

>   
> b) More complex: Dissolve GrlOperationOptions, have several filter
> objects and logical operations that can be combined freely, much like
> QtGallery's filtering.
> Has the benefit that it's more flexible in terms of what filters you
> can
> glue together, you don't have to check if something in the
> OperationOptions is really set, but might be awkward to construct
> filters from C, that's where
> 

Probably this is the approach I like the most, if going with this
feature. It helps to keep the simple options just simple, and puts the
complexity just in the complex part (the filter). Also, if in future we
need to add another complex feature (like sorting), it wouldn't affect
the existing one, and only need to worry about that new feature. And
finally, this helps to restrict the combination of operators we support,
because it is done programmatically.

> c) Filter Language: comes into the field. Have a simple language that
> is
> easily transformable in other languages such as SQL, SPARQL etc. as a
> substitute for filtering and the generic query language you already
> provide. This sounds slightly more complex, but in fact could be build
> upon both a) and b), with b) being easier, a) requiring some boolean
> normalization in the process, then transforming to target filter
> language.
> 

To be honest, I have bad experience with the approach from our times in
MAFW (the framework used in Maemo). First, you need to tune the syntax
to avoid complex combinations. Then, depending on how different the
chosen language is regarding the syntax of the target language (the
language you need to use for the final webservice), you can end up with
plugin developers not implementing it to avoid parsing the language,
seeing if the constructors fits or not in the target language, etc.

So summing up, of all the 3 approaches, I would prefer (b).


	J.A.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]