Re: possible deadlock on invalid UTF-8 data

From: Daniel Elstner <daniel elstner gmx net>
To: Havoc Pennington <hp redhat com>
Cc: gtk-devel-list <gtk-devel-list gnome org>
Subject: Re: possible deadlock on invalid UTF-8 data
Date: 27 Nov 2001 21:46:42 +0100

Am Die, 2001-11-27 um 21.14 schrieb Havoc Pennington:
> 
> Daniel Elstner <daniel elstner gmx net> writes:
> > 
> > the utf8_skip_data[] array (glib/gutf8.c:104) contains 0 at index 0xfe
> > and 0xff.  This could easily cause endless loops when iterating over a
> > UTF-8 string by using g_utf8_next_char().
> > 
> > I know that 0xfe and 0xff are forbidden in UTF-8 strings, but those
> > shouldn't cause a deadlock IMHO.  Sometimes it's just not appropriate to
> > validate every string before passing it to the g_utf8_* functions.
> 
> If you use next_char on invalid UTF-8, it can easily skip onto invalid
> memory - so you have to validate first to be safe, even with the
> change you've suggested.

Yes, but as long as the pointer is not dereferenced it should work. 
(Although ANSI C only guarantees that moving the pointer to a position
immediately after the last element will work, I consider failures when
moving it six bytes after the end very rare.)

> The policy for GLib and GTK is that _all_ UTF-8 must be validated, and
> that none of the functions are safe against invalid UTF-8, with a few
> specific exceptions (the GMarkup parser is safe, and g_utf8_validate()
> itself is obviously safe).

I absolutely agree with the policy.  But if we can easily avoid an
endless loop even in case the programmer makes an error, shouldn't we
try to do so?

Just my 2 cents,
--Daniel

Follow-Ups:
- Re: possible deadlock on invalid UTF-8 data
  - From: Havoc Pennington

References:
- possible deadlock on invalid UTF-8 data
  - From: Daniel Elstner
- Re: possible deadlock on invalid UTF-8 data
  - From: Havoc Pennington

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]