Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
- From: Aleksander Morgado <aleksander lanedo com>
- To: Martyn Russell <martyn lanedo com>
- Cc: "Tracker \(devel\)" <tracker-list gnome org>
- Subject: Re: [Tracker] nie:plainTextContent, Unicode normalization and Word breaks
- Date: Fri, 23 Apr 2010 10:27:00 +0200
I think it makes sense to fix this. Just to be clear, does this mean we
don't need Pango in libtracker-fts/tracker-parser.c to determine word
breaks for CJK?
Well, of course not sure about this. I understand the need of
word-breaking in libtracker-fts, but I could also understand the need of
word-breaking when extracting file contents, to perform the limitation
on the number of words to be extracted. Thus, it actually doesn't make
sense to have two places doing word-breaking. It all depends on the
desired contents to be stored in nie:plainTextContent: either
non-formatted text just containing whitespace-separated proper words, or
formatted text without any explicit word separation.
I have no idea what libunistring is like, we should probably quickly
evaluate it before adopting it. It sounds like you have experience there
Well, GNU libunistring is quite new, and easy to use. It's written by
Bruno Haible, which is one of the guys writing also the gnu portability
library and also the one who wrote GNU libiconv.
] [Thread Prev