Metadata Store
- From: "Enrico Minack" <minack l3s de>
- To: <dashboard-hackers gnome org>
- Subject: Metadata Store
- Date: Wed, 9 Aug 2006 12:21:05 +0200
Hi Metadata-Hackers,
I just wanted to catch up with the lates progress and also wanted to point
to some questions that arised while we were working on thinks like RDF
storage, metadata, and querying.
So, how is going? Any news concerning implementation, design or used
libraries?
I put some effort in getting Sesame and Sesame2 [1] working under C# and it
worked quite well. I was able to write into an Sesame native store (which is
quite fast! see [2]). I created 100.000 simple triples and stored them into
the local repository with
- 1,47k triples/second on my laptop @ 800 Mhz
- 1,93k triples/second on my laptop @ 2000 Mhz
- 7,14k triples/second on our server @ 2000 Mhz (Athlon 64bit 3800+)
Surprisingly, the memory consumption of the C# test program was lower than
the Java version. But the Java program was faster :-(. Maybe IKVM can make
some improvements on that.
The C# port even worked with an remote repository using the HTTP protocol
described in section 8 of the Sesame documentation [3].
Now lets come to some technical resp. implementational questions:
How do you plan to integrate the rdf store into Beagle's architecture?
- Hard-coded like the Lucene indexes or dynamically linked like the Filters
and the Queryables?
I could imagine an implementation where possible RDF stores share a common
API (as all Filters do), and they are compiled against Beagle and stored in
a specific folder where Beagle recognizes its presence. Via configuration
the preferred RDF store can be selected. Therefore one could easily replace
the RDF store with any kind of implementation: file-based, rdbms-based,
remote server, different libraries as semweb, Jena, sesame, yars, kowari,
...
How about the Ontology used within the store?
- Do the Filters have to comply to one?
- Does every filter have its own way to describe metadata?
How shall the metadata be queried?
- Full-text search on the attributes using the query keywords?
- special queries like "metadata:..."?
- what about paths of metadata like "document of author X received as
attachment via email from Y" which matches
document hasAuthor X
document isAttachmentOf EMail
EMail from Y
How are results ranked if they are found in the rdf store but not in the
lucene index?
- how can these scores merged with lucene scores?
As you can see many questions may arise. We already work on many of these
due to our research activities. Some of them should be addressed upfront
(architectural and design issues), others, of course can be addressed when
they emerge.
Hoping for interesting comments,
Enrico M.
[1] http://www.openrdf.org/
[2] http://tripletest.sourceforge.net/2005-06-08/index.html
[3] http://www.openrdf.org/doc/sesame/users/ch08.html
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]