Re: [Tracker] New CVS version



Laurent Aguerreche wrote:
Le mardi 22 aoÃt 2006 Ã 19:00 +0200, Marcus Fritzsch a Ãcrit :
Veeery nice work!!

here's a small patch introducing some debian/ubuntu build depends - i
am not sure about the versions though, so I took them from ubuntu
dapper (w/o ubuntu specific revision things)


Looking forward to see tracker/sqlite in action :)

I removed libmysqlclient15-dev dependency (and so libssl-dev one) due to
coming use of sqlite (or other things).  :-)

yeah these dependencies will be optional (you will be able to choose which backend to use - sqlite or mysql)

I cant say which will be better at the end of the day but even so I will support both (sqlite's sql syntax is 99% compatible with mysql although it does not support stored procedures yet so it wont be too much work to support both)




But now, there are new dependencies: one with -lmagic, and one on Pango.

I think -lmagic is for libmagic-dev?
( http://packages.debian.org/unstable/libdevel/libmagic-dev )
Configure script is missing tests on it.

these is no pc for libmagic so instead we check for visibility of magic.h in configure.in



Is Pango really required? On Debian it depends on:
- libcairo2;
- libfontconfig1;
- libfreetype6;
- libglib2.0-0
- libx11-6;
- libxft2;
- zlib1g.

And libx11-6 has dependencies on some Xorg elements: libxau6, libxdmcp6,
libx11-data and x11-common.

etc.

So it makes Tracker depends on many things which will install a big part
of X, perhaps all X on some distributions...

pango is only required for its word breaking ability with languages that do not contain word break characters.

in the new indexer, we break words as follows:

1) we use libmagic to determine if its an ASCII file and or English and therefore use non-utf8 techinques to break and parse words very quickly

2) if utf-8 we use utf-8 techniques

3) if text has no spaces and is not ASCII/English we assume it might be CJK or some other lnaguage that does not contain word break characters. In this case we use the pango word break functionality to break up words correctly. This is incredibly slow compared to 1 and 2 above (it takes 15minutes+ to break 1MB of text compared to less than a second with 1)

We use no other functionality of pango in tracker so if any knows of a free C lib that can do word breaks in a language indepedent manner then let me know.



--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]