Re: Comments on last iteration of GIOChannel patch

From: Ron Steinke <rsteinke w-link net>
To: otaylor redhat com, rsteinke w-link net
Cc: gtk-devel-list gnome org
Subject: Re: Comments on last iteration of GIOChannel patch
Date: Thu, 19 Jul 2001 13:36:25 -0700
> From: Owen Taylor <otaylor redhat com>
>
> Here are some comments from looking through the last iteration of your
> GIOChannel patch. I think the partial-char and UTF-8 portions probably
> need some more refinement, but why don't you go ahead and commit what
> you have here. (with possible changes commented below.) 
>
> I think it will be easier to do that rather than keep a big patch
> separate, and that will get the API changes for things like
> g_io_channel_new_file() sooner. (Make sure to make a ChangeLog
> entry and use that as the text of your commit message.)
>
> Could you also file a bug or bugs for the efficiency/buffer-handling
> FIXMEs ?

Okay. It's bug 57771.

> Thanks,
>                                         Owen
>
>
>   -      g_io_channel_flush (channel, err);
>   +      g_io_channel_set_flags (channel, flags & ~G_IO_FLAG_NONBLOCK, NULL);
>   +      /* Ignore any errors here, they're irrelevant */
>
> I'd move this comment up above the call to set_flags() to make it
> clear what it is referring to.
>
>      if ([bytes to flush]) 
>        {
> 	 if (err)
> 	   result = g_io_channel_flush (channel, &tmperr);
> 	 else
> 	   result = g_io_channel_flush (channel, NULL);
>        }
>      else
>        result = G_IO_STATUS_NORMAL;
>
>      status = channel->funcs->io_close (channel, err);
>
>      if (result != G_IO_STATUS_NORMAL)
>        {
> 	 if (status == G_IO_STATUS_NORMAL)
> 	   {
> 	     if (err)
> 	       *err = tmperr;
> 	     return result;
> 	   }
> 	 else if (tmperr)
> 	   g_error_free (tmperr);
>        } 
>
>      return status;
>    }
>
> I find this code quite confusing, and it has a bug in it: if err 
> is NULL, tmperr is leaked. Suggest rewriting as:

If err is NULL, tmperr is always NULL, since it doesn't get passed
to g_io_channel_flush(). I agree that it is confusing, though, so
I'll go back to using g_propagate_error().

>     if ([bytes to flush]) 
>       {
> 	if (err)
> 	  result = g_io_channel_flush (channel, &tmperr);
> 	else
> 	  result = g_io_channel_flush (channel, NULL);
>       }
>     else
>       result = G_IO_STATUS_NORMAL;
>
>     status = channel->funcs->io_close (channel, err);
>
>     if (status != G_IO_STATUS_NORMAL)
>       {
> 	g_clear_error (tmperr);
> 	return status;
>       }
>     else if (result != G_IO_STATUS_NORMAL)
>       {
> 	g_propagate_error (err, tmperr);
> 	return result;
>       }
>     else
>       return G_IO_STATUS_NORMAL;
>   }
>
> It could be compressed to 
> if (status != G_IO_STATUS_NORMAL) {} else {}, but I think the
> extra case improves clarity.
>
>   @@ -435,19 +468,26 @@
>    GIOCondition
>    g_io_channel_get_buffer_condition (GIOChannel *channel)
>    {
>   -  return
>   -    ((channel->read_buf && channel->read_buf->len > 0) ? G_IO_IN : 0) |
>   -    ((channel->write_buf && channel->write_buf->len <= channel->buf_size) ? G_IO_OUT : 0);
>   +  GIOCondition condition = 0;
>   +
>   +  if ((channel->read_buf && (channel->read_buf->len > 0)) /* FIXME full chars how? */
>   +    || (channel->encoded_read_buf && (channel->encoded_read_buf->len > 0)))
>   +    condition &= G_IO_IN;
>
> Remember to fix this (one of my previous mails had two suggestions
> as to how to do this. Perhaps a bug should be filed if you
> aren't going to do it immediately.

It's fixed in the current code on my machine. The current code (which I'm going
to check into CVS) also has the other bug fixes we discussed, in addition
to a bug fix for the error handling in g_io_channel_read_line() which
wasn't quite right previously. I've also changed g_io_channel_new_file()
so it doesn't call g_io_channel_unix_new(), avoiding calls to fstat()
and fcntl() (I think you also mentioned this at some point).

>   @@ -572,6 +607,8 @@
>    {
>      g_return_if_fail (channel != NULL);
>      g_return_if_fail (!line_term || line_term[0]); /* Disallow "" */
>   +  g_return_if_fail (!line_term || g_utf8_validate (line_term, strlen (line_term),
>   +                    NULL)); /* Require valid UTF-8 */
>
> You can replace strlen (line_term), with -1.
>
> Two things in g_io_channel_flush() that I think we've already
> covered:
>  
>  * a return length of 0 means AGAIN or EOF, so instead of handling
>    it, you should just:
>
>       if (status != G_IO_STATUS_NORMAL)
>         return status;
>
>       g_assert (this_time > 0);
>
>    [ If we are allowing people to write their own IO channels, which  
>      is probably a lost cause, then we might want a more explicit
>      g_warning saying what is wrong ]

What would an EOF mean on write? Do you mean that write() never returns
zero characters written unless count == 0 or there is an error?

>  * G_IO_CHANNEL_ERROR_PCHAR_FLUSH
>
>   @@ -822,26 +873,32 @@
>      g_return_val_if_fail (channel != NULL, G_IO_STATUS_ERROR);
>      g_return_val_if_fail ((error == NULL) || (*error == NULL), G_IO_STATUS_ERROR);
>
>   -  /* Make sure the buffers are empty */
>   +  /* Make sure the encoded buffers are empty */
>
>   -  g_return_val_if_fail (channel->read_buf->len == 0, G_IO_STATUS_ERROR);
>   -  g_return_val_if_fail (channel->write_buf->len == 0, G_IO_STATUS_ERROR);
>   -  g_return_val_if_fail (channel->encoded_read_buf == NULL ||
>   +  g_return_val_if_fail (!channel->encoded_read_buf ||
> 			  channel->encoded_read_buf->len == 0, G_IO_STATUS_ERROR);
>   +  g_return_val_if_fail (channel->partial_write_buf[0] == '\0', G_IO_STATUS_ERROR);
>
> Isn't what we need to check here that encoded_read_buf and write_buf
> are empty? What we care about is that buffers including the _external_
> encoding are empty. partial_write_buf[] is always in UTF-8.

No, we need to make sure that the UTF-8 buffers are empty. The
data matching the external encoding is conceptually the same as data
in the file.

When the user changes encodings and does a read, they
expect the next data from the file to be converted using the
new encoding. Data in read_buf is fine, since it has not yet
been converted into UTF-8. Data in encoded_read_buf has already
been converted using the wrong encoding.

Similarly, any data in write_buf has already been converted and
can be written directly to the file. Data in partial_write_buf
is a partial character that the user thought was converted
using the old encoding, so converting it with a different
new encoding would be inappropriate.


> [ I guess in the case of going to raw, we might want to require
>   it to be empty, so perhaps just always requiring it to be empty
>   is OK. It's probably an error in any case if it isn't empty.  ]
>
>      did_encode = channel->do_encode;
>
>      if (!encoding || strcmp (encoding, "UTF8") == 0 || strcmp (encoding, "UTF-8") == 0)
>        {
>
> Just noticed this ... We either need to require "UTF-8" exactly or
> be even more liberal and do g_ascii_strcasecmp(). I'd incline
> to requiring UTF-8 exactly.

The output of iconv --list on my machine includes both
"UTF-8" and "UTF8". If we don't handle these explicitly, iconv
will create a converter and handle them anyway.

>   +      /* Make sure the file buffers are empty */
>   +
>   +      g_return_val_if_fail (!channel->read_buf || channel->read_buf->len == 0,
>   +                            G_IO_STATUS_ERROR);
>   +      g_return_val_if_fail (!channel->read_buf || channel->write_buf->len == 0,
>
> I think you meant !channel->write_buf, though see note above.

Thanks. Actually, this code should have been in the raw encoded
case instead, which is now a different API function anyway.

>   @@ -947,22 +994,17 @@
>      gsize read_size, cur_len, oldlen;
>      GIOStatus status;
>
>   -  if (!channel->ready_to_read)
>   +  if (channel->is_seekable && ((channel->write_buf && channel->write_buf->len > 0)
>   +    || (channel->partial_write_buf[0] != '\0')))
>
> Wrapping of expressions should follow parentheses for indentation. 
> (More of these elsewhere.)
>
> The UTF-8 code needs a fair bit of refinement. Looking, for instance
> at g_io_channel_read_chars()
>
>  * I would avoid validating the output of iconv() ... either we have a broken
>    system, which we can't do anything about, or we are just wasting cycles.
>
>    The validation only needs to occur on read, since GTK+ assumes that 
>    UTF-8 from the user is valid. So, I'd just put it in fill_buffer()
>    along side the iconv() when you are doing real conversion.

We still need to find character breaks in g_io_channel_read_chars(),
(which is why I was validaing data converted by iconv), but should
probably be using g_utf8_next_char() instead. This is fixed
in the patch I'm going to commit.

>  * I didn't expect you to use _both_ g_utf8_get_char_extended and
>    g_utf8_validate.

I guess this means I need to write my own loop. Fixed in the new patch.

>  * Couple of style notes
>
>       for (start = use_buf->str; g_utf8_validate (start, use_buf->str
>         + *bytes_read - start, &end) == FALSE && *end == '\0'; start = end + 1);
>         /* Continue validate past intermediate nulls, no body */
>
>    - Comparison == FALSE is to be avoided. 
>  
>    -  It would be better to avoid such convoluted use of a for loop and write
>       this with a while.
>
>         {
>           gunichar try_char;
>
>           if (status == G_IO_STATUS_NORMAL && count >= 6)
>             try_char = (gunichar) -1; /* avoid function call, see asserts below */
>           else
>             try_char = g_utf8_get_char_extended (use_buf->str, use_buf->len);
>
>           switch (try_char)
>             {
>               case -2:
>                 g_assert (use_buf->len < 6);
>                 if (status == G_IO_STATUS_AGAIN)
>                   return G_IO_STATUS_AGAIN;
>                 g_assert (status != G_IO_STATUS_NORMAL);
>                 g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_PARTIAL_INPUT,
>                              "Channel terminates in a partial character");
>                 break;
>               case -1:
>                 g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
>                              "Illegal UTF-8 sequence");
>                 break;
>               default:
>                 g_assert (count < 6); /* buffer too small */
>                 return G_IO_STATUS_NORMAL;
>             }
>
>           return G_IO_STATUS_ERROR;
>         }
>  
>  * Try something like:
>
>    const char *p = use_buffer->str + old_len;
>    while (p < use_buffer->str + use_buffer->len) 
>      {
>        gunichar ch = g_utf8_get_char_extended (use_buffer->str + i, use_buffer->len - i);
>        if (ch == (gunichar)-1) 
>          {
>            g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
>                         "Illegal UTF-8 sequence");
>            break;
>          }
>        else if (ch == (gunichar)-2)
>          {
>            g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_PARTIAL_INPUT,
> 	                "Channel terminates in a partial character");
>            break;
>          }
>        else
>          {
>            ch = g_utf8_next_char (ch);
>          }
>      } 

This is gone, as the code has moved into g_io_channel_fill_buffer(),
where the error handling is actually simpler.

>  * I get the sense that you are making some efforts to try and let people recover
>    when they get a invalid character in the input stream. While I think its
>    certainly bad if a single misencoded byte makes the entire file unreadable,
>    I don't see how to really avoid this problem.
>
>    Once we have hit a misencoded character in the input stream, we have no way
>    in general to recover to a sane state and resynchronize. If the input stream
>    is in a known stateless encoding like UTF-8 we are better off ... we can
>    simply advance along until we get another valid character. But in general
>    no.
>
>   -    return g_io_channel_flush (channel, error);
>   -  else
>   -    return G_IO_STATUS_NORMAL;
>   +    {
>   +      gsize wrote_bytes;
>   +
>   +      status = channel->funcs->io_write (channel, channel->write_buf->str,
>   +                                         channel->write_buf->len, &wrote_bytes, NULL);
>   +
>   +      if (status == G_IO_STATUS_NORMAL)
>   +        g_string_erase (channel->write_buf, 0, wrote_bytes);
>   +    }
>   +
>   +  return G_IO_STATUS_NORMAL;
>    }
>
> shouldn't we be returning an error if the io_write encounters an error?

This is a relic from when I was trying to make sure bytes_written
was always zero on error.

I'm still not happy with the current approach to returning write errors
for g_io_channel_write_chars(). The case which best shows the difficulty
is if io_write() returns G_IO_STATUS_AGAIN partway through writing
a many byte write (much larger than the buffer size). Currently
we return G_IO_STATUS_AGAIN, with bytes_written set to the
number of bytes written so far. This doesn't make sense two me.
I see two sensible choices:

1. Return G_IO_STATUS_NORMAL with bytes_written equal to what
we've written so far. That is, return this as a partial write.

2. Ignore the error and let the write buffer grow to an arbitrary size.

I like 1) better, and it seems to be more in line with what you seem to
want. If that's the case, we need to accept that we're going to have
to allow partial writes.

>   Index: glib/glib/giochannel.h
>   ===================================================================
>   RCS file: /cvs/gnome/glib/glib/giochannel.h,v
>   retrieving revision 1.7
>   diff -u -r1.7 giochannel.h
>   --- glib/glib/giochannel.h	2001/06/30 15:22:03	1.7
>   +++ glib/glib/giochannel.h	2001/07/15 17:26:23
>   @@ -39,6 +39,7 @@
>    typedef struct _GIOChannel	GIOChannel;
>    typedef struct _GIOFuncs        GIOFuncs;
>
>   +#ifndef G_DISABLE_DEPRICATED
>                          ^
> Spelling. DEPRECATED.
>
>    typedef enum
>    {
>      G_IO_ERROR_NONE,
>   @@ -46,27 +47,24 @@
>      G_IO_ERROR_INVAL,
>      G_IO_ERROR_UNKNOWN
>    } GIOError;
>   +#endif /* G_DISABLE_DEPRICATED */
>                            ^  
> Same here.

Also, this breaks glib if it's compiled with G_DISABLE_DEPRECATED set.
The functions in giochannel.c which use GIOError will still be
compiled, due to an #undef G_DISABLE_DEPRECATED at the begining
of the file. I've removed this #if from the header.

>    struct _GIOChannel
>    {
>
> Question:
>
>  * Are we allowing people to create their own GIOChannel
>    types? It's pretty darn tricky, and if we disallow 
>    it, we give ourselves a lot more wiggle room for
>    future changes without breaking bin compat.
>
> If the answer to that questions is yes, then we need to 
> stick /*< private >*/ and /*< public >*/ into the
> structure to define what parts are publically accessible.  
>
> If the answer to that question is no, then we should
> move the GIOChannel structure definition and 
> g_io_channel_init() into a giochannel-private.h header
> file.

This needs to get done. This is now bug 57773.

>   Index: glib/glib/giounix.c
>   ===================================================================
>   RCS file: /cvs/gnome/glib/glib/giounix.c,v
>   retrieving revision 1.15
>   diff -u -r1.15 giounix.c
>   --- glib/glib/giounix.c	2001/07/02 20:26:37	1.15
>   +++ glib/glib/giounix.c	2001/07/15 17:26:23
>   @@ -177,6 +177,9 @@
>      GIOUnixChannel *unix_channel = (GIOUnixChannel *)channel;
>      gssize result;
>
>   +  if (count > SSIZE_MAX) /* At least according to the Debian manpage for read */
>   +    count = SSIZE_MAX;
>
> This qualification is also part of the Unix98 standard, so I think
> you can say something like   /* Effect of passing larger values is undefined */
>
>     retry:
>      result = read (unix_channel->fd, buf, count);
>
> SSIZE_MAX is probably not sufficiently portable. I think we need to add
> definitions of G_MAXSIZE, G_MAXSSIZE to the glibconfig.h.

If you can get something more general defined, I'd be happy to use it.

>   @@ -397,80 +390,102 @@
>      glong fcntl_flags;
>      gint loop;
>      GIOUnixChannel *unix_channel = (GIOUnixChannel *) channel;
>   -  struct stat buffer;
>
>   -  fcntl_flags = fcntl (unix_channel->fd, F_GETFL);
>   +  do
>   +    fcntl_flags = fcntl (unix_channel->fd, F_GETFL);
>   +  while ((fcntl_flags == -1) && (errno == EAGAIN));
>   +
>      if (fcntl_flags == -1)
>        {
> 	 g_warning (G_STRLOC "Error while getting flags for FD: %s (%d)\n",
> 		   g_strerror (errno), errno);
> 	 return 0;
>        }
>
> I think getting EAGAIN for fcntl (fd, F_GETFL) is bizarre enough
> so that I'd rather find out about if it occurs for someone than
> try to handle it from the start ;-)

Removed this from my code.

>   +  if (flags != -1)
>   +    {
>   +      /* Don't know if fcntl flags overlap, be careful */
>   +
>   +      if (flags & O_WRONLY)
>   +        {
>   +          channel->is_readable = FALSE;
>   +          channel->is_writeable = TRUE;
>   +        }
>   +      else if (flags & O_RDWR)
>   +        channel->is_readable = channel->is_writeable = TRUE;
>   +#if O_RDONLY == 0
>   +      else /* O_RDONLY defined as zero on linux (elsewhere?) */
>   +        {
>   +          channel->is_readable = TRUE;
>   +          channel->is_writeable = FALSE;
>   +        }
>   +#else /* O_RDONLY == 0 */
>   +      else if (flags & O_RDONLY)
>   +        {
>   +          channel->is_readable = TRUE;
>   +          channel->is_writeable = FALSE;
>   +        }
>   +      else
>   +        channel->is_readable = channel->is_writeable = FALSE;
>   +#endif /* O_RDONLY == 0 */
>  
> If O_RDONLY == 0 is common, this implies to me that we really should
> probably restrict is_readable/writeable based on the values passed to
> g_io_channel_new_file() when we are creating channels in that
> fashion.

g_io_channel_new_file() fixed to no longer call g_io_channel_unix_new().

>   Index: glib/glib/gunicode.h
>   ===================================================================
>   RCS file: /cvs/gnome/glib/glib/gunicode.h,v
>   retrieving revision 1.22
>   diff -u -r1.22 gunicode.h
>   --- glib/glib/gunicode.h	2001/07/02 00:49:20	1.22
>   +++ glib/glib/gunicode.h	2001/07/15 17:26:27
>   @@ -168,6 +168,8 @@
>    #define g_utf8_next_char(p) (char *)((p) + g_utf8_skip[*(guchar *)(p)])
>
>    gunichar g_utf8_get_char          (const gchar *p);
>   +gunichar g_utf8_get_char_extended (const gchar *p,
>   +                                   gsize        max_len);
>    gchar*   g_utf8_offset_to_pointer (const gchar *str,
> 				      glong        offset);  
>    glong    g_utf8_pointer_to_offset (const gchar *str,      
>   Index: glib/glib/gutf8.c
>   ===================================================================
>   RCS file: /cvs/gnome/glib/glib/gutf8.c,v
>   retrieving revision 1.21
>   diff -u -r1.21 gutf8.c
>   --- glib/glib/gutf8.c	2001/06/23 13:55:07	1.21
>   +++ glib/glib/gutf8.c	2001/07/15 17:26:28
>   @@ -549,7 +549,7 @@
>    /* Like g_utf8_get_char, but take a maximum length
>     * and return (gunichar)-2 on incomplete trailing character
>     */
>   -static inline gunichar
>   +gunichar
>    g_utf8_get_char_extended (const gchar *p, gsize max_len)  
>    {
>      guint i, len;
>
> I have a patch locally that does this just a bit differently; I'll
> post/apply that here as soon as possible. 
>
>

I'll wait to commit my patch until yours is in.

Ron
References:
- Comments on last iteration of GIOChannel patch
  - From: Owen Taylor
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]