Re: Proposing Tracker for inclusion into GNOME 2.18

From: Joe Shaw <joeshaw novell com>
To: desktop-devel-list gnome org
Subject: Re: Proposing Tracker for inclusion into GNOME 2.18
Date: Thu, 19 Oct 2006 18:11:38 -0400
Hi,

[ This mail is pretty long and rambling, I'll try to keep responses
concise from here.  -j ]

On Thu, 2006-10-19 at 20:01 +0100, Jamie McCracken wrote:
> okay, I was holding back on criticising beagle by not comparing it to 
> tracker but I guess Im allowed to now... (I'll be nice dont worry!)

Well, seeing as you said you haven't run Beagle in over a year, I'm not
sure you're qualified to compare the two.

> possibly but a number of users have used it and they have not had 
> anything like the problems that beagle has. 

This is completely irrelevant in the context of *tracker's* suitability
for inclusion.

> We have test suites and a very vibrant community and mailing list 
> where things have been very well tested. 

Test suites and communities are great, and I don't doubt that Tracker is
better for having them, but these are not an adequate substitute for
widespread deployment.  A community is probably at most a couple hundred
people -- not a couple thousand, and the people who are involved in the
development of Tracker probably don't have the breadth of data that
needs to be tested.

> Touching data is read only - we are not writing anything to 
> files so it wont eat your docs! (lets keep the FUD out!)

I didn't mean to imply that files were written to or that Tracker might
accidentally delete them; I don't think that's the case.

What I meant is that Tracker has to read and process all that
information.  One thing I've learned from Beagle is that there is a lot
of broken data out there, or that our code to process that data was
broken.  (This has particularly been a problem with non-free file
formats.)

> >         * The scope of Tracker isn't clear.  Is the point to be fast
> >         search for files, or for all the user's data?  To what end has
> >         Tracker achieved its goal?
> 
> It should be clear - to be the best

To be the best what?  This is exactly my point.  Is the idea to index
everything or is the idea to index some subset?  Is the idea to store
all application data, or specific metadata?  Part of the problem here I
think is that it isn't clear to people how we use Tracker to improve our
desktop experience.

> tracker is mature at indexing files and does a much better job than some 
> leading contenders including spotlight and other commercial indexers 
> which all seem to consume significant resources. Tracker is different 
> and arguably better in this area.

Yes, the key phrase here is "indexing files".  There are two problems
with searching only files:

        * It doesn't lend itself to a generic design, suitable for
        searching all different types of data.  A spec or an abstraction
        layer is only any good if it has at least two implementations.
        
        * It doesn't reflect a large portion of the user's data,
        including (most importantly) email, IM conversations,
        addressbook and calendar items, notes, etc. etc.  If that's not
        the focus of Tracker that's okay, but it's not clear.

As to the resource usage, it's difficult to measure this, and I
encourage others to do their own testing, but from my indexing runs with
Tracker today while it did use substantially less memory than Beagle, it
used a lot more CPU, thrashed the disk a lot more, and generally made
the system slower over time.  It would be helpful to profile this and
quantify the usage.

> Not yet but I will be doing the  bookmark history stuff in Epiphany 
> using this so time will tell on this. I note there is no other metadata 
> server available to Gnome and its a huge missing part of the platform so 
>   it should not be dismissed out of hand. Im also a database expert with 
> over 10 years of designing and optimising relational databases so Im 
> confident I can deliver here (especially as its a piece of cake for me!).

I don't think that "just trust me" is a valid argument here.

I wholeheartedly agree that the lack of a larger metadata plan is a
problem for the platform.  Without anyone using Tracker for this
purpose, I think it's premature to approve it.

> this is the thin wrapper around the dbus method - see the introspect 
> file for details :
> http://cvs.gnome.org/viewcvs/tracker/data/tracker-introspect.xml?rev=1.12&view=markup

This helps a little, but it's still not comprehensive.

(As an aside, I thought there was consensus that all new modules had to
be fully documented for acceptance?)
         
> >         * It's hard to tell for certain because of the above point, but
> >         the search APIs appear to have a major usability problem in that
> >         you can't search for both text and metadata at the same time
> >         using freeform text entered by the user.  (Think Google here,
> >         which searches both document content and metadata.)  This will
> >         be a problem when searching emails, for instance, because people
> >         will type "Joe Shaw eggplant" and expect it to match from the
> >         author field and the body of the message.
> 
> of course you can - have you tried it?

Ok, I apologize then.  I have not tried writing to the APIs, I got the
opposite impression from the different APIs and your recent blog entry:

        http://jamiemcc.livejournal.com/3782.html
        
In any case, if the searches hit only one source, that's a good thing.

> Also rdf query allows for easy and effortless mixing of data searched 
> from inverted word index  and the sqlite database so there are no limits 
> here

But this means you can't do a freeform search, right?  You have to say,
I want text "foo" from the index AND value "bar" from metadata key
"quux".

> Considering it uses the gnome-search-tool source which is very stable 
> and I have simply removed some code and added a little here and there 
> then its not as exaggerated as you make out. The application is pretty 
> stable but the only way to confirm this is to try it out.

Ok, I was looking only at the CVS history.  If it's built upon g-s-t
that's a good start.

> Tracker is miles ahead on the database side (which is non-existant in 
> Beagle bar your backup for systems without EA's)- we have tags/keywords, 
> extensible metadata, first class object storage etc. 

The database is an implementation detail.  Beagle from the very
beginning has extracted metadata from documents and allowed people to
search them.

As for "first class object storage", I don't really know what that
means.  What is an object in this case?  It's not like you throw a
GObject at Tracker and it stores it.  Short of that, it's just a schema
that you define and it's not really anything different than a document.
Beagle has this too (as long as there is a URI to associate with it).

> Beagle is only ahead on what it indexes and heres the big point - 
> tracker's goal is not  to index everything under the sun but only the 
> important stuff.

What is the important stuff?  The uncertainty of scope is one of the
murkiest things in my mind.  On the one hand you seem to want everything
in the desktop to use it to store information, etc. but on the other you
only want to focus on indexing a subset of the user's data.  I perceive
a conflict here.

> Well Im proposing it to replace gnome-search-tool with 
> tracker-search-tool and nothing you said above has any bearing on this 
> (ignoring the FUD).

No FUD intended.

> It fills in vital holes in our platform (tagging,  extensible metadata 
> and persistent storage)

"Extensible metadata" is a lot larger realm than just Tracker or Beagle
or any one piece of software can address.  How is metadata propagate
between copies?  How does metadata propagate between users?  This is a
large problem, but orthogonal to this discussion.

> However, as Beagle is not being proposed, cannot get into the platform ( 
> C is only allowed in the platform) and is plagued by a significant no of 
> problems (all of which do not apply to tracker), I dont see how 
> comparing beagle to tracker is relevant to this discussion?

In the end maybe it isn't.  But considering Beagle is pretty widely
deployed today and used in both GNOME and KDE environments, Tracker
would need to exceed Beagle in terms of developer and user experience.

(Also, I hardly think that Beagle is "plagued by a significant number of
problems", but whatever.)

> FWIW though, if tracker and beagle can share a common dbus interface 
> then letting tracker in would benefit everyone - thats the way I see it!

Yeah, we'll prototype something and see.

Joe
Follow-Ups:
- Re: Proposing Tracker for inclusion into GNOME 2.18
  - From: Don Scorgie
- Re: Proposing Tracker for inclusion into GNOME 2.18
  - From: Jamie McCracken
References:
- Proposing Tracker for inclusion into GNOME 2.18
  - From: Jamie McCracken
- Re: Proposing Tracker for inclusion into GNOME 2.18
  - From: Elijah Newren
- Re: Proposing Tracker for inclusion into GNOME 2.18
  - From: Joe Shaw
- Re: Proposing Tracker for inclusion into GNOME 2.18
  - From: Jamie McCracken
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]