Re: FYI: better UTF8 decoder.

From: Daniel Elstner <daniel kitta googlemail com>
To: Behdad Esfahbod <behdad behdad org>
Cc: gtk-devel-list gnome org
Subject: Re: FYI: better UTF8 decoder.
Date: Thu, 30 Apr 2009 01:30:27 +0200

Am Montag, den 13.04.2009, 21:26 -0400 schrieb Behdad Esfahbod:
> On 04/13/2009 05:00 AM, Butrus Damaskus wrote:
> > Hi!
> >
> > This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
> > have better (quicker and smaller?) utf8 decoder. Maybe it would be
> > worth to look at it?
> 
> Funny how he claims "reduced complexity".  That's definitely the most complex 
> UTF-8 decoder I've seen.

I agree.  So, as we are now comparing each others UTF-8 algorithms, I
thought I would show off mine too ;-)

http://git.gnome.org/cgit/glibmm/tree/glib/glibmm/ustring.cc#n270

This has been in use for years now.  Just as g_utf8_get_unichar(), it is
not meant to cope with invalid UTF-8.  Its strong point is that you do
not need a table at all, thereby avoiding the invisible function call to
fetch the global offset table pointer, if the code is part of a shared
library.

> Anyway, as I said on my own UTF-8 decoding post [1], not worth changing glib 
> unless someone shows a real profile of a real application with UTF-8 decoding 
> taking a measurable part of the total run time.

Agreed.  We only have our own implementation in glibmm because we needed
it to work directly with std::string iterator instead of a plain
pointer.

--Daniel

References:
- FYI: better UTF8 decoder.
  - From: Butrus Damaskus
- Re: FYI: better UTF8 decoder.
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]