Re: XML-based sources (Metadata Factory plugin)



On Wed, 2012-06-20 at 20:15 +0100, Bastien Nocera wrote:
> The problem I have with it is that the syntax, due to the fact that it's
> using XML, is very complicated, probably based on XPath and XSLT. I
> actually found it more complicated than writing the C code. Now I can
> only imagine how complicated if one tried to implement a source for a
> service that uses json (which I see you've done, I certainly wouldn't be
> able to juggle those 2 so easily).


I can agree that XML syntax is not the most beautiful one, specially
when it comes to write it.

Also, the source I exposed there (BBC news) is one of the most
complicated, mainly because the regular expressions you need to handle
to get the URL.

On the other side, you can take a look at the Nascar Videos source:

https://gitorious.org/~jasuarez/grilo/jasuarez-grilo-plugins/blobs/media-factory/src/media/media-factory/sources/grl-nascar.xml

Definitely it is quite simple to write and understand than the BBC one.

But the point is that sooner or later, you need to deal exactly with
everything defined in the XML source. Well, you can avoid using XPath
and use iterations to get everything you need from the XML. But
sometimes XPath simplifies quite a lot the code (we suffered this in
some of the sources).

Also, something I need to add sooner or later are tools and helpers for
source developers, to assist in developing the source.

I saw that we need a specific system when creating the plugin to inform
developer about lot of things, so they can understand what's wrong with
the source definition.

Regarding writing sources for JSON-based services, to be honest, when I
did this plugin it was intended only for XML-based services, and JSON
was out of the work. I think most of the services out there provides the
results as XML, so not using JSON isn't a big problem.

The idea of adding support for JSON later (as you see, it wasn't
designed with JSON in mind) came later: I saw that JSON could be roughly
seen as a simplified XML, and transforming a JSON to XML is quite
straightforward. Thus, to cover maybe those small number of services
that only provides JSON data, I added the JSON support. But it is not a
true JSON support: rather, transforms JSON into XML so you deal with
XML.

So summing up, this plugin is designed to work mainly with XML; JSON is
a second class citizen here.

But everything is defined in that spec is the core of the source, and if
you write it in C, sooner or later you need to deal with it.

> Personally, I would rather have seen:
> - remove the need for plugins to implement the threading/async'ness, and
> do it all in the calling code.

Well, you can implement everything in a single call, like

grl_bbc_source_search (stuff) {
  videos = search_videos ();
  foreach (video in videos) {
    send (video);
  }
}

But of course, that is not desirable at all, because you would be
blocking the full UI while running that function.

A lot of asyncness we do in the sources is precisely to avoid the
previous problem.

If we go with a simple call, I see only two options:

* Do not split the functions, but run every now and then a
g_main_context_iteration(), so the UI doesn't become blocked.

Something like:

grl_bbc_source_search (stuff) {
  xml_page = search_videos ();
  foreach (xml_video in xml_page) {
    video = xml_to_video(xml_video);
    send(video);
    run_pending_iterations();
  }
}

* Add threads. As you well point-out in bgo#667557, this can be done
with GAsync callbacks/results. The point with threads is that if it is
not done very very carefully, it could be a can of worms.

The idea of using threads is not very new for me, and actually one of
the reasons (among others) I was exploring with using a GIO-alike syntax
in Grilo was precisely because it allows to use threads in a very
controlled manner. So it could help to improve the performance of those
sources particularly heavy or difficult to split in asynchronous
functions. But still, this exploration is far to go somewhere.

> - add helper functions for oft-used features. This might be XML and
Json
> helpers for example.

Regarding the helpers for XML/JSON, I don't have a clue of what
functions we should provide. Probably you are talking about something
similar to GrlNet, where we provide a high-level and simple functions to
retrieve content from Internet.

So far I've been using libxml2 and json-glib. The former is not very
friendly, specially for the GNOME world. I think someone is writing a
wrapper to integrate it with glib. Aren't this libxml2-glib and
json-glib enough?


> - implement a plugin that uses libpeas or lua to export that
> functionality to a scripting language
> 

Uhm... some time ago I was also exploring the use of libpeas to
implement the plugins. So it would allow to implement it in any other
language.

But I didn't go too much far because one of the key points in using
libpeas is the introspection, and our API is not very friendly in this
sense.

So I stopped it while we do not fix this problem.

A different approach could be something similar (I think) to what
gstreamer does (forgive me if I'm making a mistake here): have a plugin
that is able to run the main work in a python module. Similar plugins
can be done for JS or Lua. Those plugin would be responsible to
transforms the requests from C to Python/JS/Lua, and transform back the
answers.

The good part of those plugins is that they doesn't affect the core, and
we can keep the API we have right now. Those plugins would be totally
optional (as the current media factory is).

And there is still another approach: having D-Bus based plugins. This
was commented in different places, and it is listed in our TODO. The
main idea was to isolate unstable plugins in their own process, so a
crash would not imply a break of the full application. But another
benefit is that the source can be written in any language: it only needs
to provide the API through D-Bus.

The rough idea here is that we would be a plugin in Grilo able to
send/receive requests/answers through D-Bus, acting as a wrapper of
remote sources.

And we would have a separated Python library to make easy the
development of sources, and exposing the API through D-Bus.


> I probably wasn't very clear in
> https://bugzilla.gnome.org/show_bug.cgi?id=667557
> 
>
> If
>  you decide to carry on with this XML-based plugin, the 2 first items
> in the list are still very interesting to plugin developers.
> 

Yes. These media/metadata factory plugins are a big step towards
simplifying the building of new sources.

And the ideas you are exposing undoubtedly are very interesting.

Maybe we can set up a meeting in the next GUADEC (I guess you are
coming :)) and talk about these ideas, so we/I can know what to
implement.

Thank you very much for your comments!

	J.A.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]