Re: SuperDataDaemonThing - Lets Plan



On Wed, Mar 5, 2008 at 9:11 AM, John Carr <john carr unrouted co uk> wrote:
> Hi guys
>
>
>  >  > I think it is smart doing this on top of the conduit daemon because we have
>  >  > already done a lot of the periphial stuff needed, including configuration
>  >  > getting/setting, and gui configuration for those yucky dataproviders that
>  >  > need you to sign into a website. We also have some rather comprehensive unit
>  >  > tests to make sure that we continue to work with all theses online
>  >  > dataproviders.
>  >  And I agree that this work is invaluable, and the last thing I want us
>  >  to do is rewrite the same logic (the whole point of the daemon). As
>  >  mentioned above, its more a concern of scope and purpose. I feel
>  >  strongly that the universal data access point for a desktop should be
>  >  headless and as lean as is feasible. It should also stick to one task
>  >  and perform it well. I really have (over the course of this
>  >  conversation) become quite fond of the idea of simply splitting
>  >  conduit. We leave the UI/Sync element with its current Gtk and Dbus
>  >  interfaces, and we simply move all the fetching/pushing to a separate
>  >  process, and communicate over dbus. We expand the conduit backend
>  >  daemon to fit the more generic role of desktop provider while
>  >  maintaining solid ties to the Conduit system.
>
>  I'm not really fussed whether "it" is in conduit (the project) or out
>  of conduit, as long as it is sufficiently isolated. For example, at
>  the moment closing the conduit GUI kills the daemon..
Exactly, I just don't want us to be cramming all this functionality on
as an extra dbus interface  attached to conduit, we need to be able to
guarantee application writers that they can expect the daemon to be
running.
>
>
>  >  > > 2) Once the user has a DataProvider, it seems like a small expansion
>  >  > > of the basic CRUD is a good start:
>  >  > >   * get_all
>  >  > >   * get
>  >  > >   * put
>  >  > >   * delete
>  >  > >   * refresh
>  >  >
>  >  > I would also add finish() to give dataproviders the ability to do some
>  >  > transaction if they wish, and the ability to clean up memory.
>  >  Certainly.
>
>  I'm not entirely sure of the role of "refresh" and "finish" in a
>  shared daemon. Specifically, isn't it the purpose of the daemon to
>  periodically check flickr and notify the applications that there is a
>  new photo available? If some kind of connection operation needs to
>  take place, I think the daemon should take care of it? Finish implies
>  transaction, but if two apps were to call put at the same time there
>  wouldnt be any kind of isolation.. So what are the semantics of the
>  finish() call - could I end up "finish()ing" some other applications
>  processes?
>
Originally refresh was to 'force' a remote update. I think finish
could still be useful information to have, lets think in terms of
caches and whats kept on disk vs. in memory etc. Especially when
dealing with conversions or potentially large sets.
>
>  >  > The premise of my argument is that its more eficient for the application
>  >  > that wants the data to deal to do so with it in a standardised format, using
>  >  > the most suitable native libraries for the application, then it is to break
>  >  > all of the datatypes into tuple like form and then send them over the bus.
>  >  > By effficient I also mean less work.
>
>  Anything that reduces the amount of work is great. And reusing VTHING
>  formats is great. My concern is making sure everyone can work with the
>  data we return easily. If our data can be fetched into a gobject then
>  its available in C, C++, C#, Vala, Python, Java, and presumably more.
>  Iterating over a vcard and into a json structure and back seems an
>  approachable starter here, that or a glib-vcard...
>
>
>  >  >  If we need to expose additional metadata to calling apps than what can be
>  >  > expressed in a file uri then we can do either
>  >  > 1) add a get_metadata(LUID, key)
>  >  That works, I would probably support a get_all_metadata() as well, to
>  >  reduce the number of roundtrips. But that's just semantics.
>
>  I think the metadata is what we should get from get(uid). The file
>  location is just a piece of meta data? Thats what i'd imagined in the
>  photo cases anyway - you don't return the JPG data but a location and
>  data like its tags and what not.
>
>  I do think its important to have a few first class object formats, as
>  I don't see any real value in a common transport layer but still
>  needing custom code every where to decode the output for every
>  endpoint we ever support. So if we have widely available libraries for
>  dealing with vcard then every call to a contact DP could return a
>  vcard. But if they don't, I don't really see a problem with a "codec"
>  that repacks a VCard into a json library, which does (or can more
>  easily, at least) have cross language libraries...
>
>  So anyone familiar with dealing with the VTASK format and C# (for an
>  experimental tasky backend that poked conduits Evolution DP)...
>
>
>  > > 2) give the option of returning the LUID unmodified, i.e. before it is
>  >  > converted to a file. This may be a smarter solution where the LUID is
>  >  > standardised, and the app already has bindings to the dataprovider that
>  >  > conduit is providing a proxy for (e.g evolution - where caching doesnt make
>  >  > sense anyway).
>
>  I think this is the case where you argued that an app should still
>  have to use python-evolution, which I presume i misunderstood. If SDDT
>  is *just* a webservice proxy with caching then evolution shouldn't be
>  represented at all. If its something of a gvfs for "first class" data
>  objects (contacts, events, tasks, photos, etc) then.. would gvfs
>  expect your gvfs app to know about libsamba etc?
>
Not at all, which was the point I was trying to make, expect I realize
it was totally muddled.
>  My view is simply: There are two separate things we are talking about
>  here, and its getting muddled into one I think: The transport
>  abstraction and the data abstraction.
>
>  The transport abstraction in conduit is in good shape, I think people
>  (gimmie especially) would prefer to see it more asynchronous perhaps?
>  Maybe this is a moot point (can the dbus api make the synchronous
>  asynchronous?). Within an hour anyone with a bit of python and conduit
>  knowledge could expose the basic CRUD model over dbus and poke any of
>  it with stuff.
Agreed. Given the nature of the GIL it strikes me as bad practice to
have lots of incoming connections as well as a GUI and db interactions
all in one process. However, I really don't know how conduit handles
this so I'll wait until I've checked out the code to comment further.
>
>  Data abstraction, conduit really hasn't bothered with. Specifically in
>  the PIM data cases. "Here is some vcard data, enjoy it". We've created
>  thin wrappers around the real data to pass around but done nothing to
>  make it easier for people to work with the data. When you are syncing
>  thats enough. If we are going to function as a data abstraction then I
>  don't think its good enough to ignore this. We need to define first
>  class objects and how their data is passed around. Sure, passing a
>  file over dbus is crazy, but passing a VCARD isn't so crazy. Evolution
>  is moving to a dbus backend, as is banshee (i remember a post about
>  how the new banshee will have a headless backend and its GUI will have
>  an O(1) load time even with a million tracks or something?). If we can
>  easily provide a GObject-ish way to deal with the data, even better...
That's a great way to explain it. Now personally, as a programmer, I
would prefer an object with all the metadata that can be stored in a
vcard than a string of pain-in-the-butt VCARD. That's just one
opinion, but I think its going to be a case-by-case thing. As far as a
good general rule, I'm inclined to use something similar to the Beagle
model, with every object being composed of the following information.

Hit
|--mtime
|--uid
|--content_url
|--source
|--type
|--metadata (a dict)

now the beauty of taking this 'first class' is we can provide a
specific set of metadata for each type as opposed to the dict. But the
biggest thing to take note of is the distinction between metadata and
content. In this context not all objects have content, some are
composed primarily or even completely of metadata (think contacts), As
a result, they return a null content_url .

In short:
1) I still feel that we need a separation between daemon/datastore and
sync/gui/config. In the end, I think you guys (the conduit devs) are
best suited to make the final call here. As long as we can meet the
following use cases:
    a) User closes Conduit but leaves Gimmie open and continues to use.
    b) Days of uptime
    c) Heavy use (read multiple clients making high demands) is
handled intelligently
    d) The 'line' is drawn somewhere which makes semantic sense and
isn't just the most convenient right now. We should make a clear
distinction between data/transport logic and sync logic.
2) The major point of discussion is how far to take our first class
objects. We all agree that the transport logic should be hidden (akin
to the gvfs model). The question remains, how do we make that data
exceptionally usable to clients? Is it best in its raw form to allow
the client to do what they will with it, or do we parse the data into
handy objects for ease of use.
3) Authentication: No real census on this yet, but I'm assuming just
using gnome-keyring and letting it handle the
notification/memorization is an acceptable use case for now?
4) We need to pick an update model for our data, should we be polling
automatically on our own? Do we only update when a client triggers it?
(and then throw update events for other subscribed clients). While the
idea of universal desktop notifications of remote changes seems
awesome, I'm concerned about resources. But I wait patiently to hear
what you guys think.

Am I correct?
>
>  John
>



-- 
Kevin Kubasik
http://kubasik.net/blog


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]