Re: Beagle's scope

From: "Kevin Kubasik" <kevin kubasik net>
To: "D Bera" <dbera web gmail com>
Cc: Kevin Godby <godbyk gmail com>, Dashboard Hackers <dashboard-hackers gnome org>
Subject: Re: Beagle's scope
Date: Thu, 10 Jan 2008 11:25:38 -0700
On Jan 8, 2008 7:44 AM, D Bera <dbera web gmail com> wrote:
> Hi Kevin,
>
> > 1. RDF Store.  I know that the Beagle++ folks had integrated an RDF
> > store into their Beagle modifications.  Are there any plans for Beagle
> > proper to include an RDF store?  Or does this belong under a separate
> > project?  If Beagle will incorporate its own RDF store, should an
> > external application (such as Dashboard) be able to add things to
> > Beagle's RDF store or should it maintain its own RDF store?
>
> The beagle-rdf-branch in svn has a partial implementation of
> overlaying an RDF store on beagle data i.e. to say handle RDF queries
> instead of usual BeagleClient queries. The code in the branch handles
> all kinds of <subject, predicate, object> queries. What is missing is
> to implement a Semweb.Selectable source to add proper RDF flavour to
> the process. My RDF knowledge is not enough to finish the rest :(
>
> The above is an experiment to see if beagle could be mimicked as an
> RDF store while not designed to be one from ground up. If it works,
> then any RDF client will be able to query beagle.
>
> The other part about storing data is tricky. Again beagle was designed
> to pull and index information from data, not store data. There is API
> to add extra data in beagle (search Joe's blog) like tags or
> additional properties. But for serious applications it is much
> desirable to have a dedicated store and let beagle index it for you.
> There has been on-and-off efforts in writing a separate metadata store
> (possibly a simple sqlite table to start with) but none was completed.
Hey, I should probably put this into a branch, since it actually is
sitting on something not completely unlike what is being mentioned
here. Offering some simple Api to store Triples (read: Uri, datatype,
and data) in a sqlite table wasn't the problem, it was determining
where and how we link it in. Personally, I can't really seem to figure
out _why_ people want all their data siloed into one big btree but
thats besides the point, I think that there is a certain buzz afoot,
and seeing how leaftag is dead on the ground, and were already a
daemon running, it wouldn't be completely out of the question for us
to provide a means to store metadata (line drawn a real data, e-mail
should not be stored in Beagle ;) ).

The other concern/issue/quandry is how should metadata stored in some
sqlite table somewhere be queried/merged with the Lucene results? How
would those terms weigh? The exact procedures of storage really aren't
all that important to people, what I do think we might want to look
into is simplifying the process of 'feeding' and 'retreiving' data
based on a uri in our Api. Say I have an e-mail that has been assigned
a Red label in whatever e-mail client. Right now, our API makes
finding that information about that e-mail result easy, if we find it
as a result of some other query. However, while the capability exists,
finding out if 'email://24 happy com' has a red label is not a
seamless API action. An idea that I've fiddled with for a while is
beefing up our Hit class to support basic CRUD against Beagle, Hear me
out!

So, I figure out uri queries and get a Hit representing what Beagle
knows about  'email://24 happy com', we have a pretty robust set of
metadata, the client program displays this to the user.  What if the
user changes the label to Blue now? Granted normally we would recrawl
and notice right away, but lets pretend that this is a 'proactive'
program, and all of our indexed data about this e-mail client has to
come over the BeagleClient API. While this change certainly isn't
hard, its a far cry from:
hit.Properties["label"] = "blue"
hit.Save(); OR hit.SaveAsync();
Which does the needed logic to update our indexies with the new data.
This opens the door to clients not using beagle as some storage
mechanism, but for its powerful tokenizers, stemmers and lightning
fast full-text search.
Building on that, it would be pretty easy to have a
hit.RemoveFromIndex(); method. The Create step is a little trickier,
since Hit's and Indexable's are fundamentally different things,
however I do think that we might want to give a slightly more
'code-concise' means of both indexing and querying single or sets of
Uri's. Something like
client.GetHit(Uri);
client.GetHits(Uri[]);
Maybe something as concise as
client.Index(Uri,Data,Properties[])

I would also propose adding another override of the AddProperty()
method which just takes 2 strings and assumes you wanted a Property
(not Keyword, Date, or Unsearched). However, all that doesn't do us
much good without guaranteed persistence, I haven't looked, but If I
were to use the client API to index a Uri already in another index,
then that Uri got signaled for removal, would the user-inserted
information still be available?

Anyways, something to think about, personally, if we start pushing
elements of the Beagle Client API as a easy way to get hyper-powered
search for all your content as an application developer (for
near-free). Right now, the only apps that really use us are ones that
are accessing already indexed data. I think that painless powerful
search with full-blown stemmers etc. (and suggestions soon!) could be
a huge draw to developers. Say, a RSS reader that wants to offer a
fulltext search, or an e-mail client, or some obscure and random
application that we have never heard of.

While the capability has always been available, I think we should just
look into the accessibility of our api for some of these uses. People
probably don't care how we do it, or even why, just that its cheap and
easy (obviously, I'm also a strong supporter of us going DBus in the
near future, however, implementation of all this would be somewhat
straightforward (some potential performance problems, but nothing
thats beyond home, I hope)

Anyways, my take on the whole storage situation. The same Uri specific
(and optimized) queries that were added for initial RDF stuff could be
exposed a simple Api as well. There's no imperative need for it,
everything mentioned can already be done, I just think some tasks are
a little to roundabout to get the attention that they should.

Cheers
Kevin
>
> > 2. Indexing of web apps data.  Should Beagle index only desktop/local
> > data or should it also index data from web apps?  For example, should
> > Beagle (using web APIs) index my Google Docs?
>
> I presonally see no problem in beagle pulling out indexable
> information from different data sources (local or web). There has been
> talks about a backend to index GMail mails. A backend to index/query
> google docs would be a worthwhile addition.
>
> - dBera
>
> --
> -----------------------------------------------------
> Debajyoti Bera @ http://dtecht.blogspot.com
> beagle / KDE fan
> Mandriva / Inspiron-1100 user
>
> _______________________________________________
> Dashboard-hackers mailing list
> Dashboard-hackers gnome org
> http://mail.gnome.org/mailman/listinfo/dashboard-hackers
>



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
References:
- Beagle's scope
  - From: Kevin Godby
- Re: Beagle's scope
  - From: D Bera
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]