Metadata Store

Hi Metadata-Hackers,

I just wanted to catch up with the lates progress and also wanted to point to some questions that arised while we were working on thinks like RDF storage, metadata, and querying.

So, how is going? Any news concerning implementation, design or used libraries?

I put some effort in getting Sesame and Sesame2 [1] working under C# and it worked quite well. I was able to write into an Sesame native store (which is quite fast! see [2]). I created 100.000 simple triples and stored them into the local repository with
- 1,47k triples/second on my laptop @ 800 Mhz
- 1,93k triples/second on my laptop @ 2000 Mhz
- 7,14k triples/second on our server @ 2000 Mhz (Athlon 64bit 3800+)
Surprisingly, the memory consumption of the C# test program was lower than the Java version. But the Java program was faster :-(. Maybe IKVM can make some improvements on that. The C# port even worked with an remote repository using the HTTP protocol described in section 8 of the Sesame documentation [3].

Now lets come to some technical resp. implementational questions:
How do you plan to integrate the rdf store into Beagle's architecture?
- Hard-coded like the Lucene indexes or dynamically linked like the Filters and the Queryables? I could imagine an implementation where possible RDF stores share a common API (as all Filters do), and they are compiled against Beagle and stored in a specific folder where Beagle recognizes its presence. Via configuration the preferred RDF store can be selected. Therefore one could easily replace the RDF store with any kind of implementation: file-based, rdbms-based, remote server, different libraries as semweb, Jena, sesame, yars, kowari, ...

How about the Ontology used within the store?
- Do the Filters have to comply to one?
- Does every filter have its own way to describe metadata?

How shall the metadata be queried?
- Full-text search on the attributes using the query keywords?
- special queries like "metadata:..."?
- what about paths of metadata like "document of author X received as attachment via email from Y" which matches
    document hasAuthor X
    document isAttachmentOf EMail
    EMail from Y

How are results ranked if they are found in the rdf store but not in the lucene index?
- how can these scores merged with lucene scores?

As you can see many questions may arise. We already work on many of these due to our research activities. Some of them should be addressed upfront (architectural and design issues), others, of course can be addressed when they emerge.

Hoping for interesting comments,
Enrico M.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]