Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks

From: Aleksander Morgado <aleksander lanedo com>
To: jamie mccrack gmail com
Cc: "Tracker \(devel\)" <tracker-list gnome org>
Subject: Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
Date: Sun, 25 Apr 2010 22:34:35 +0200

Hi Jamie,

I think it makes sense to fix this. Just to be clear, does this mean we 
don't need Pango in libtracker-fts/tracker-parser.c to determine word 
breaks for CJK?


Thats not broken so would not recommend trying to "fix" that

IMHO, The tracker_text_normalize() in the extractor should just do utf8
validation. It should not attempt word breaking as thats cpu expensive
and being done by the parser already


But then how can we limit the extracted text based on the number of
words?

Cheers,
-- 
Aleksander

Follow-Ups:
- Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Jamie McCracken

References:
- [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Aleksander Morgado
- Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Martyn Russell
- Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Jamie McCracken

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]