For my Python/GTK/libbeagle program, I want to support Unicode fully
(ha), so I spent some time learning how to work with Unicode in C#
(where 'char' is only 16 bits -- d'oh!) for my Beagle filter.  I
thought I had it all figured out...

When I couldn't make it work, I just made a plain text file with 3
Latin characters, 3 Georgian characters, and 3 Linear B (i.e.,
non-BMP) characters, and saved it as UTF-8.  Then I fired up "Desktop
Search" / "beagle-search" (every app under GNOME seems to have two
names!) and tried searching by each triple.  As I feared, Latin and
Georgian worked, but Linear B didn't.  (From Python, it looks like
U+10000 is coming out as 2 ASCII spaces.)

Does Beagle not support Unicode >3.0 yet?  Is somebody working on it
already?  Do Beagle's dependencies (like Lucene or Gtk#) handle newer
Unicode versions?  (Hopefully it can be upgraded piecemeal, and not

I'm a bit of a language junkie, and while I don't know if I'll ever
need it, I'd be willing to do a little bit of extra legwork to help
out Beagle here.


- Ken

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]