Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks

From: Aleksander Morgado <aleksander lanedo com>
To: Martyn Russell <martyn lanedo com>
Cc: "Tracker \(devel\)" <tracker-list gnome org>
Subject: Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
Date: Fri, 23 Apr 2010 10:27:00 +0200

Hi Martyn,


I think it makes sense to fix this. Just to be clear, does this mean we 
don't need Pango in libtracker-fts/tracker-parser.c to determine word 
breaks for CJK?


Well, of course not sure about this. I understand the need of
word-breaking in libtracker-fts, but I could also understand the need of
word-breaking when extracting file contents, to perform the limitation
on the number of words to be extracted. Thus, it actually doesn't make
sense to have two places doing word-breaking. It all depends on the
desired contents to be stored in nie:plainTextContent: either
non-formatted text just containing whitespace-separated proper words, or
formatted text without any explicit word separation.

I have no idea what libunistring is like, we should probably quickly 
evaluate it before adopting it. It sounds like you have experience there 
though.


Well, GNU libunistring is quite new, and easy to use. It's written by
Bruno Haible, which is one of the guys writing also the gnu portability
library and also the one who wrote GNU libiconv.

Cheers,
-Aleksander

References:
- [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Aleksander Morgado
- Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
  - From: Martyn Russell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]