Re: possible deadlock on invalid UTF-8 data



On Tue, 2001-11-27 at 17:57, Owen Taylor wrote:
> The tradeoff here is basically:
> 
>  - Easy to debug
> 
> vs.
> 
>  - If encountered, hopefully continue working "well enough"
>    to be minimally useful for the user.

The problem is that the deadlock isn't always easy to debug.  Forcing a
segfault (or by returning NULL, which would probably result in an
immediate crash in most cases) would be much easier.

A real-life example: gtkhtml had some issues with strings that weren't
getting converted from locale to utf-8, or got converted twice by
mistake, or whatever.  (This is with the copy of the glib utf8 code that
got put into GAL.)  As a result, gtkhtml had some deadlocks from
g_utf8_* functions getting called on bad data.

If we had just crashed (which no doubt would have happened if the code
had just gone ahead and tried to recover), users could have reported the
problem via bug-buddy and sent us nice backtraces.  Instead we got a
bunch of vague error reports about the evolution composer locking up.

> Strings are validated at enough places that the chance of invalid
> UTF-8 not getting caught at all is low.

Well, it isn't as if everything gets converted to utf-8 by magic. 
Programs that deal with text files, e-mail messages, etc. need to be
very careful about converting from locale to utf-8 and back again.


-JT





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]