Using RDF queries with beagle
- From: Debajyoti Bera <dbera web gmail com>
- To: Beagle <dashboard-hackers gnome org>
- Subject: Using RDF queries with beagle
- Date: Fri, 15 Feb 2008 09:09:02 -0500
Hey all,
Here is the official email all of you have been waiting for (where all=3).
Beagle now has a rough support for RDF queries. Its really rough but all the
infrastructure is there, or so I believe.
* Get the beagle-rdf branch [1]
* Start beagled normally. For testing, disable indexing by
passing "--indexing-delay -1" to beagled.
* To query for triples,
$ cd tools
$ gmcs RdfQueryTool.cs -r:../Util/Util.dll -r:../BeagleClient/Beagle.dll
$ export MONO_PATH=../Util/:../BeagleClient/
$ mono RdfQueryTool.exe <arg1> <arg2> <arg3> <arg4>
where arg1 = "" or uri (subject)
arg2 = "" or property-name (predicate)
arg3 = {Text,Keyword,Internal} (property-type, use any value of
arg2 is empty)
arg4 = "" or value (object)
=> this will list the matching triples
* For sophisticated RDF query,
$ cd RDFAdapter
$ ./beagle-semweb-client
=> this currently lists all triples
RDFAdapter uses a Semweb [2] adapter to talk to beagle. So, you can create any
RDF query structure supported by Semweb (e.g. SPARQL). SemWebClient.cs is a
simple app asking for all triples. It should be possible to run any kind of
Semweb queries (edit SemwebClient.cs or write your own client using the
SelectableSource provided in BeagleSource.cs).
Trying to act like an RDF store will of course have its own problems. I will
list some:
- 20K emails, each with 10 fields creates 200K triples. I am not sure how
scalable this is going to be.
- RDF only does direct string matching; there is no concept of stemming,
analysis, wildcard searches (this is what I am told). Currently this is not
enforced in our implementation i.e. you can use wildcards, searching
for "beagle" will match a document with a field with value "beagle rocks". I
am not sure this is the right thing here.
- Should the RDF query match the text-content of the documents or only the
metadata ? The reason is, I am not sure how a triple will look like for the
text data <uri, "Text", [... whole text !!!]> ?
- If a query matched a document in beagle, it means some property matched the
query. But the actual property that matched the query has to be returned as
the RDF triple. The current implementation is pretty expensive (it runs
through all of the fields of all the matching documents and tries to re-run
the query on each field, and noting down which fields matched).
If this works well, I am inclined to merge the feature to svn trunk. So, if
you are interested please give it a spin. And any improvements / patches /
suggestions are welcome.
- dBera
[1] Its trunk +RDF changes, so you wont missing anything.
http://svn.gnome.org/svn/beagle/branches/beagle-rdf/
[2] http://razor.occams.info/code/semweb/
--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]