Search Bookmarks Driver (New Contributor Questions)



Hey Everyone,

Ive been reading the beagle hackers guide and am realy interested in hacking around. I currently develop in C# on windows, but the maturity of the mono project, the cool aplications being developed with it (beagle, F-stop, etc) and the encouragement to contribute to projects like beagle cause me to begin my switch.

I realy want to have a go at developing something useful for beagle. Im thinking a bookmark (read: bookmark content) indexer for firefox/other browsers.

Im thinking that upon noticing the bookmark file change it will go off and check every website in a users bookmark list, downloading them all (text only), and index each bookmarks content (if the website has been updated). I know i have about a million bookmarks with thoroughly non descriptive titles (hence jsut indexing the bookmark file alone is useless), i hope that this will alow me to find them. Does this sound like it would be useful to anyone else but me?

So with that out of the way I have a few questions on the general operation of beagle - and how my thing will fit in (im a noob to beagle and Open source colaborative projects in general) Sorry about all the questions..... I'l add their answers to the wiki when I know them.....

1) I presume that what I want to do comes under the general heading of an External Search Driver 2) As per the hacking guide I set up Inotify events etc. When the bookmark file changes foreach bookmark in bookmarks,
   - Download (text only)
   - MD5
- Cmpare MD5 with old MD5 of bookmark to see if the site has been updated
   - If so add bookmarked site to index
   - else
3) See below

Now for the questions
1) How does an external query driver add things to the Lucene index system. Looking at code for other external drivers (Tomboy in this case)

Indexable indexable = NoteToIndexable (file, note);
Scheduler.Task task = NewAddTask (indexable);
task.Priority = priority;
task.SubPriority = 0;
ThisScheduler.Add (task); //Is this the line where Beagle becomes aware of the info to index (and hence indexes it at some time in future)?

2) How about persistance between instances, the flow of the operation,...
(Assuming when parsing the bookmark file, i download each bookmark to a tempfile. say ~/beagle/bookmarkstemp/http://www.google.com.temp) like if a bookmark changes, when I call This.Schedular for the bookmark at some time in future is its uniqueness determined by indexable.ContentUri and no harm is done by adding ThisScheduler.Add for each bookmark without deleting the old one in the schedular/index?

(aside: instead of using flat files to store the bookmark sites, can I use a SQLite database??. Im just wondering what you guys think is the tidiest solution to prevent me having to reindex every website in a users bookmarks when the website may not have been updated - and how this plugs in with each task in the schedualr and its assoxiated indexable.SetTextReader (which i presume, being a text reader, needs a flat file to read text from.))

Sorry about the barrage of questions, ;-)
John



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]