Using RDF queries with beagle



Hey all,
	Here is the official email all of you have been waiting for (where all=3). 
Beagle now has a rough support for RDF queries. Its really rough but all the 
infrastructure is there, or so I believe.

* Get the beagle-rdf branch [1]
* Start beagled normally. For testing, disable indexing by 
passing "--indexing-delay -1" to beagled.
* To query for triples,
  $ cd tools
  $ gmcs RdfQueryTool.cs -r:../Util/Util.dll -r:../BeagleClient/Beagle.dll
  $ export MONO_PATH=../Util/:../BeagleClient/
  $ mono RdfQueryTool.exe <arg1> <arg2> <arg3> <arg4>
    where arg1 = "" or uri (subject)
               arg2 = "" or property-name (predicate)
               arg3 = {Text,Keyword,Internal} (property-type, use any value of 
arg2 is empty)
              arg4 = "" or value (object)
  => this will list the matching triples

* For sophisticated RDF query,
  $ cd RDFAdapter
  $ ./beagle-semweb-client
  => this currently lists all triples
RDFAdapter uses a Semweb [2] adapter to talk to beagle. So, you can create any 
RDF query structure supported by Semweb (e.g. SPARQL). SemWebClient.cs is a 
simple app asking for all triples. It should be possible to run any kind of 
Semweb queries (edit SemwebClient.cs or write your own client using the 
SelectableSource provided in BeagleSource.cs).

Trying to act like an RDF store will of course have its own problems. I will 
list some:
- 20K emails, each with 10 fields creates 200K triples. I am not sure how 
scalable this is going to be.
- RDF only does direct string matching; there is no concept of stemming, 
analysis, wildcard searches (this is what I am told). Currently this is not 
enforced in our implementation i.e. you can use wildcards, searching 
for "beagle" will match a document with a field with value "beagle rocks". I 
am not sure this is the right thing here.
- Should the RDF query match the text-content of the documents or only the 
metadata ? The reason is, I am not sure how a triple will look like for the 
text data <uri, "Text", [... whole text !!!]> ?
- If a query matched a document in beagle, it means some property matched the 
query. But the actual property that matched the query has to be returned as 
the RDF triple. The current implementation is pretty expensive (it runs 
through all of the fields of all the matching documents and tries to re-run 
the query on each field, and noting down which fields matched).

If this works well, I am inclined to merge the feature to svn trunk. So, if 
you are interested please give it a spin. And any improvements / patches / 
suggestions are welcome.

- dBera


[1] Its trunk +RDF changes, so you wont missing anything. 
http://svn.gnome.org/svn/beagle/branches/beagle-rdf/

[2] http://razor.occams.info/code/semweb/

-- 
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]