Metadata Store

From: "Enrico Minack" <minack l3s de>
To: <dashboard-hackers gnome org>
Subject: Metadata Store
Date: Wed, 9 Aug 2006 12:21:05 +0200

Hi Metadata-Hackers,

I just wanted to catch up with the lates progress and also wanted to pointto some questions that arised while we were working on thinks like RDFstorage, metadata, and querying.

So, how is going? Any news concerning implementation, design or usedlibraries?

I put some effort in getting Sesame and Sesame2 [1] working under C# and itworked quite well. I was able to write into an Sesame native store (which isquite fast! see [2]). I created 100.000 simple triples and stored them intothe local repository with

- 1,47k triples/second on my laptop @ 800 Mhz
- 1,93k triples/second on my laptop @ 2000 Mhz
- 7,14k triples/second on our server @ 2000 Mhz (Athlon 64bit 3800+)

Surprisingly, the memory consumption of the C# test program was lower thanthe Java version. But the Java program was faster :-(. Maybe IKVM can makesome improvements on that.The C# port even worked with an remote repository using the HTTP protocoldescribed in section 8 of the Sesame documentation [3].


Now lets come to some technical resp. implementational questions:
How do you plan to integrate the rdf store into Beagle's architecture?

- Hard-coded like the Lucene indexes or dynamically linked like the Filtersand the Queryables?I could imagine an implementation where possible RDF stores share a commonAPI (as all Filters do), and they are compiled against Beagle and stored ina specific folder where Beagle recognizes its presence. Via configurationthe preferred RDF store can be selected. Therefore one could easily replacethe RDF store with any kind of implementation: file-based, rdbms-based,remote server, different libraries as semweb, Jena, sesame, yars, kowari,...


How about the Ontology used within the store?
- Do the Filters have to comply to one?
- Does every filter have its own way to describe metadata?

How shall the metadata be queried?
- Full-text search on the attributes using the query keywords?
- special queries like "metadata:..."?

- what about paths of metadata like "document of author X received asattachment via email from Y" which matches

    document hasAuthor X
    document isAttachmentOf EMail
    EMail from Y

How are results ranked if they are found in the rdf store but not in thelucene index?

- how can these scores merged with lucene scores?

As you can see many questions may arise. We already work on many of thesedue to our research activities. Some of them should be addressed upfront(architectural and design issues), others, of course can be addressed whenthey emerge.


Hoping for interesting comments,
Enrico M.

[1] http://www.openrdf.org/
[2] http://tripletest.sourceforge.net/2005-06-08/index.html
[3] http://www.openrdf.org/doc/sesame/users/ch08.html

Follow-Ups:
- Re: Metadata Store
  - From: Joe Shaw

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]