Re: FYI: better UTF8 decoder.
- From: Daniel Elstner <daniel kitta googlemail com>
- To: Behdad Esfahbod <behdad behdad org>
- Cc: gtk-devel-list gnome org
- Subject: Re: FYI: better UTF8 decoder.
- Date: Thu, 30 Apr 2009 01:30:27 +0200
Am Montag, den 13.04.2009, 21:26 -0400 schrieb Behdad Esfahbod:
> On 04/13/2009 05:00 AM, Butrus Damaskus wrote:
> > Hi!
> >
> > This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
> > have better (quicker and smaller?) utf8 decoder. Maybe it would be
> > worth to look at it?
>
> Funny how he claims "reduced complexity". That's definitely the most complex
> UTF-8 decoder I've seen.
I agree. So, as we are now comparing each others UTF-8 algorithms, I
thought I would show off mine too ;-)
http://git.gnome.org/cgit/glibmm/tree/glib/glibmm/ustring.cc#n270
This has been in use for years now. Just as g_utf8_get_unichar(), it is
not meant to cope with invalid UTF-8. Its strong point is that you do
not need a table at all, thereby avoiding the invisible function call to
fetch the global offset table pointer, if the code is part of a shared
library.
> Anyway, as I said on my own UTF-8 decoding post [1], not worth changing glib
> unless someone shows a real profile of a real application with UTF-8 decoding
> taking a measurable part of the total run time.
Agreed. We only have our own implementation in glibmm because we needed
it to work directly with std::string iterator instead of a plain
pointer.
--Daniel
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]