Re: gtk_text_iter_forward_search() comparison



On Mon, 30 Jan 2017 13:53:46 +0200
"Andrew W. Nosenko" <andrew w nosenko gmail com> wrote:
On Sun, Jan 29, 2017 at 2:16 AM, Eric Cashon via gtk-devel-list <
gtk-devel-list gnome org> wrote:


I have been working on a little search experimentation. Gave
writing a case in-sensitive gtk_text_iter_forward_search() a try.
The code is shorter than what is in gtktextiter.c and it works a
little faster. If a word is searched that isn't very frequent the
time is about the same. If you just look for single chars or words
that are frequent it looks to be quicker. Not sure if this a
suitable method though. I know little of the textbuffer internals.
UTF-8 gives me some trouble also.

There is a test progam at

https://github.com/cecashon/OrderedSetVelociRaptor/blob/
master/Misc/Csamples/search_textbuffer2.c

that does a side by side comparison of the two search methods. If
there is an inherent problem with the test forward search please
say so. If not, maybe it can be used. I would be glad to work on it
a little more if corrections need to be made.
 

Sorry, but your approach just doesn't work.
You falsely assume that if bring two characters to the same case
(both to lower or both to upper), then it's enough for
case-insensetive search. While it's indeed enough for English, it's
not true in the general case. Just try to compare "Straße" and
"STRASSE", which mean "street" in German, using your code.  (The
second string is an uppercased version of the first, so searching for
one should match another.)

Problems like this can arise with unicode in any writing, including
English.  Printed words in modern English can have lower-case ligatures
similar to ß (which a few hundred years ago was also a ligature used in
printed English), viz:

field          - four code points

FIELD         - five code points

Comparing unicode strings is fraught with difficulty, including the
assumption that "character" is the same as "code point".

Chris


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]