metadata status quo

Hi All,

I kept working on the metadata store in the last weeks at low pace and
had a talk at my university about the project. The talk went pretty well
and the sqlite part of the store is finally in a state that i am happy

- automatic reference creation to and from newly indexed docs works

- all strings are stored in seperate tables and referenced (saves disk
space because a lot of fields appear more often (email addresses etc.
and makes reference creation faster)

- all insertations only need one sqlite query the rest is done by
triggers (f.e. creating references to the newly indexed doc)

- two deletation modes (delete if not referenced and forced deletion)

- also implemented using triggers -> this should keep things consistent
and hopefully avoids race conditions (sqlites job - hope it does that as

- checked basic functionality using nunit tests.

Right now i don't really know how to continue - there is a number of
things that could be done. I'd like some feedback what would be most
usefull (you can pick multiple choices ;) ):

A) work on the inclusion in beagle. I played with that a little bit but
i am not sure about how to proceed. I would like to keep the API to the
metadata store purely based on strings or BeagleClient stuff (properties
etc.). So far document creation is dealt with in LuceneCommon. I call
the metadata store from there on a per property basis. This leaves me
with two possibilities:
        - i can either use complex insert queries that query for the id
        of the subject for each statement. I'm pretty sure this is
        slower - didn't test though - maybe caching works fine here.
        - i can create the subject and store its id (uint) and hand it
        to all the calls that create new statements - this saves
        querying for the id every time but has (at least two problems) 
                1. the API no longer is string / BeagleClient based - it
                uses internal ids of the store 
                2. consistancy problems might occure if subjects ids
                change - they don't do that yet - but they might on
                deletion and optimization just like they do in lucene.

B) As i said i did not test how far these extra queries affect
performance. Might do that as well... but... 

C) Both possibilities kind of suck. So i wondered if we could move
document creation (BuildDocument etc.) to the Metastore. This would make
calling metastore functions on a per Indexable basis possible so the
internal ids could be used but still hidden from LuceneCommon. It would
also allow us to enable enriching the document with data from the
Metastore (Names for email addresses etc.) later. The latter would
involve querying the Metastore during doc creation which would make it
slower though.
There are some more design decissions involved so i wondered if it
actually makes sense to work on it now or if i should wait until some
other stuff like the distributed indexes have landed and we discuss the
best way of doing this then.

D) UI related stuff. I had a prototype running during SoC that used the
plain Lucene store to extend search results with further information
(screenshot can be found here: )
It's bitrotting right now. It did already a little bit of "association
browsing". I feel it's quite unclear when the metastore stuff will be
available and if it will be performant enough etc. So i was wondering if
i should try and save the UI part from bitrotting by merging it with the
current cvs and creating a bugreport in bugzilla trying to get this into
a state where it could go in - maybe even before the metastore. I am not
very familiar with ui developement - i read the guidelines etc. and i
tried to stick to the concept of tiles. So I'd love some feedback if you
like the design and idea - based on the screenshot - so it would be
worth putting energy into getting this in.

That's pretty much what i have been thinking about doing in the next
weeks. It will take some time because i am pretty busy with other stuff.
That's why i am asking in the first place - trying to aviod unnecessary
work ;)

Hope you're all doing well and have some nice last days of 06. 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]