Re: [Tracker] New CVS version

From: Jamie McCracken <jamiemcc blueyonder co uk>
To: Laurent Aguerreche <laurent aguerreche free fr>
Cc: Tracker List <tracker-list gnome org>
Subject: Re: [Tracker] New CVS version
Date: Tue, 22 Aug 2006 18:44:45 +0100

Laurent Aguerreche wrote:

Le mardi 22 aoÃt 2006 Ã 19:00 +0200, Marcus Fritzsch a Ãcrit :

Veeery nice work!!

here's a small patch introducing some debian/ubuntu build depends - i
am not sure about the versions though, so I took them from ubuntu
dapper (w/o ubuntu specific revision things)


Looking forward to see tracker/sqlite in action :)


I removed libmysqlclient15-dev dependency (and so libssl-dev one) due to
coming use of sqlite (or other things).  :-)

yeah these dependencies will be optional (you will be able to choosewhich backend to use - sqlite or mysql)

I cant say which will be better at the end of the day but even so I willsupport both (sqlite's sql syntax is 99% compatible with mysql althoughit does not support stored procedures yet so it wont be too much work tosupport both)



But now, there are new dependencies: one with -lmagic, and one on Pango.

I think -lmagic is for libmagic-dev?
( http://packages.debian.org/unstable/libdevel/libmagic-dev )
Configure script is missing tests on it.

these is no pc for libmagic so instead we check for visibility ofmagic.h in configure.in


Is Pango really required? On Debian it depends on:
- libcairo2;
- libfontconfig1;
- libfreetype6;
- libglib2.0-0
- libx11-6;
- libxft2;
- zlib1g.

And libx11-6 has dependencies on some Xorg elements: libxau6, libxdmcp6,
libx11-data and x11-common.

etc.

So it makes Tracker depends on many things which will install a big part
of X, perhaps all X on some distributions...

pango is only required for its word breaking ability with languages thatdo not contain word break characters.


in the new indexer, we break words as follows:

1) we use libmagic to determine if its an ASCII file and or English andtherefore use non-utf8 techinques to break and parse words very quickly


2) if utf-8 we use utf-8 techniques

3) if text has no spaces and is not ASCII/English we assume it might beCJK or some other lnaguage that does not contain word break characters.In this case we use the pango word break functionality to break up wordscorrectly. This is incredibly slow compared to 1 and 2 above (it takes15minutes+ to break 1MB of text compared to less than a second with 1)

We use no other functionality of pango in tracker so if any knows of afree C lib that can do word breaks in a language indepedent manner thenlet me know.




--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

References:
- [Tracker] New CVS version
  - From: Jamie McCracken
- Re: [Tracker] New CVS version
  - From: Marcus Fritzsch
- Re: [Tracker] New CVS version
  - From: Laurent Aguerreche

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]