Re: New GIOChannel patch

From: Ron Steinke <rsteinke w-link net>
To: gtk-devel-list gnome org
Subject: Re: New GIOChannel patch
Date: Tue, 17 Jul 2001 15:21:16 -0700
> From: Owen Taylor <otaylor redhat com>
>
> [ This mail is just a mail responding to the text of your 
>   mail ... haven't looked at the patch yet ]
>
> Ron Steinke <rsteinke w-link net> writes:
>
> > Owen,
> > 
> > I've got a new patch which addresses most of your concerns.
> > Bugzilla corrupted the patch again, so it's up on
> > 
> > http://www.elfhame.net/~rsteinke/giochannel-july-15.diff
> > 
> > The things which still need to be examined are discussed below.
> > Anything marked "FIXME efficiency" is something noncritical
> > which I haven't gotten to yet.
>
> Some procedural stuff:
>
>  * This bug report probably has reached the end of its useful
>    life -- you probably should file separate bug reports 
>    for the remaining FIXME items.

I've created the new bugs 57689-57692,57694-57695. Most of these
are for win32. I'll mark 52811 as closed once I get the next major patch
in.

>  * I'd like for as much discussion on issues like this to
>    go on gtk-devel-list as possible ... people may or may
>    not care, but at least it's publically archived.

Okay. I'm replying to your mail on the list to move the discussion there.

>  * If you send me a login and crypted password,
>    (http://people.redhat.com/otaylor/cryptit) I'd be happy
>    to create a CVS account for you with write access. (Assuming 
>    you don't already have one.)

That'd be really helpful. I'll send the login and password under
separate cover.

>    I'd like to be kept in the loop for major/API changes,
>    but I don't see any reason why you can't commit bug
>    fixes directly.
>    
> > > * lseek can't return EINTR, EGAIN, neither can fcntl, fstat.
> > 
> > The Debian man page for fcntl lists EAGAIN and EINTR as possible errors.
> > EINTR can only be returned for F_SETLKW, F_GETLK, and F_SETLK,
> > so we can ignore it, but no such restriction is given for EAGAIN.
>
> The Linux man pages really shouldn't be considered much of an
> authority - they are pretty random in what they include and 
> dubiously accurate at times. 
>
> The man page I have mentions EAGAIN only in the section on 
> F_SETLK, and the Unix98 standard says authoritatively that
> it can only occur for F_SETLK.
>
> > > * What should be the handling of EPIPE: is it really an
> > >   error not a status? (should EOF by recycled?). Would
> > >   it make sense to use send() instead of write to 
> > >   avoid SIGPIPE? Should that be optional?
> > 
> > I'm not sure what to do with EPIPE. In my previous mail,
> > which may have been somewhat confusing, I was actually
> > thinking of ESPIPE, which I've removed in favor
> > of using g_return_val_with_fail() in the seek functions.
>  
> > > * The set of errors in GIOChannelError seems a bit random - 
> > >   contains obscure errors like EPERM, ENOLCK, EDEADLK, but certainly
> > >   not nearly every error that could occur from open().
> > 
> > Fixed as per our discussion. Looking at the win32 code,
> > I think the error in win32_msg_[read,write] which was
> > previously returned as G_IO_ERROR_INVAL may be closer
> > to EMSGSIZE, but comments from people who actually
> > understand that code are necessary before we do anything
> > about the error message. Any win32 developers care to have
> > an opinion?
>
> Hmmm, no Win32 developers here... ;-) You might want to 
> send that question in a separate mail to gtk-devel-list.
>
> I do think that any GIOChannel needs to look like a
> stream ... if there is a limit on the size of a write(),
> then ->io_write() needs to simply make a short write if
> called with more bytes than allowed.

I'm not completely sure what the error means in this case, though,
since it was originally G_IO_ERROR_INVAL, which could be a number
of things. Comments from win32 people?

This question is now bug #57691.

> > > * Need to make a read-only-returns-whole-characters guarantee
> > >   for reading for UTF-8, and validate incoming data.
> > 
> > Did this, excluding the encodings NULL (guaranteed to be binary safe)
> > and GIOChannelEncodeRaw.
>
> So, we have for special encodings:
>
>  - "UTF-8" works just like it wasn't a special case, but doesn't
>    involve opening a converter from UTF-8 to UTF-8. 
>
>  - NULL. No conversion, byte-by-byte.
>
>  - GIOChannelEncodeRaw. Like NULL but no buffering?
>
> The NULL / GIOChannelEncodeRaw distinction strikes me as something
> that shouldn't be controlled by the encoding - since there
> is no difference in encoding. Rather, there should be a 
> separate g_io_channel_set_unbuffered(), or something.

Mabye we should just rename g_io_channel_set_encoding() to
something more general.

> Also, if this is the setup, I believe the default should probably
> be UTF-8, not NULL.

I'll change it, then.

> > > * I think we need to hide partial chars when writing from
> > >   the user - the total possible partial char size is only
> > >   known and is 5 bytes because it is UTF-8. (4 if we assume
> > >   that it UTF-8 encoding valid unicode characters)
> > 
> > Did this. However, it's not really possible to differentiate
> > partial characters from invalid characters in this case,
> > and failing to do so can get you wedged. Consider the case:
> > 
> > <- valid UTF-8 data->xa<- valid UTF-8 data ->
> > 
> > where 'x' is the first byte of a multibytes UTF-8 character,
> > and 'a' is an ascii character (<= 0x7f). The sequence is
> > clearly invalid, but if the data is sent to write_chars() in
> > two parts,
> > 
> > <-valid UTF-8 data->x
> > a<-valid UTF-8 data->
> > 
> > then 'x' will sit in the partial chars buffer. When the second set
> > of data is written, we will return the error G_CONVERT_ERROR_ILLEGAL_SEQUENCE.
> > On checking the second set of data, however, no illegal sequence will
> > be found.
>
> Doesn't matter. G_CONVERT_ERROR_ILLLEGAL_SEQUENCE isn't recoverable
> in any meaningful way except for displaying an error dialog to
> the user.
>  
> > > * g_io_channel_get_buffer_condition should not return
> > >   G_IO_IN if there isn't at least a whole character.
> > >   G_IO_OUT is completely wrong - should be something
> > >   like channel->write_buf->len < channel->buf_size.
> > 
> > This is somewhat saner now, but I'm still not happy with the
> > result. Since we have a buffered channel, G_IO_OUT should always be
> > set. However, we don't want to do that without polling,
> > since G_IO_IN might never get set.
>
> Well, that's handled right by:
>
>   /* Only return TRUE here if _all_ bits in watch->condition will be set
>    */
>   return ((watch->condition & buffer_condition) == watch->condition);
>
> Right? Actually, I don't think we should set the G_IO_OUT bit
> unless we have space in the buffer ... if I'm trying to write
> a 30 meg file to a socket, and am using G_IO_OUT, and I don't
> want to create a 30 meg local bugger.

So the current approach seems to be correct. The only question I
have left is if we need some sort of automatic flush watch.

> > >   Question - how do you detect EOF on a buffered channel?
> > 
> > You check that the buffer is empty _and_ try a read on the
> > channel. That is why g_io_channel_fill_buffer() sometimes
> > returns G_IO_STATUS_EOF; it indicates nothing was read on
> > the last read, so if the buffer is empty the appropriate
> > read function can return an EOF. If g_io_channel_fill_buffer()
> > returns G_IO_STATUS_EOF and the buffer terminates in a partial
> > character, that's an error.
>
> I think this question was not about the internals, but rather
> about how you would implement what you used to implement
> as:
>
>  gboolean
>  my_watch (GIOChannel *channel, GIOCondition condition, gpointer data)
>  {
>    char buf[1024];
>    size_t count;
>    
>    g_io_channel_read (channel, buf, sizeof (buf), &count);
>    if (count == 0) 
>      {
>        g_io_channel_close (channel);
>        return FALSE;
>      }
>    else
>      {
>        /* Process buf */
>        return TRUE;       
>      }
>  }
>
> I think the answer to that would be:
>  
>  gboolean
>  my_watch (GIOChannel *channel, GIOCondition condition, gpointer data)
>  {
>    char buf[1024];
>    GIOStatus status;
>    GError *err = NULL;
>    
>    g_io_channel_read_chars (channel, buf, sizeof (buf), &count, &err);
>    if (status == G_IO_CHANNEL_EOF)
>      {
>        g_io_channel_close (channel);
>        return FALSE;
>      }
>    else if (status == G_IO_CHANNEL_ERROR)
>      {  
>        g_warning ("Error reading from channel: %s\n", err->message);
>        g_error_free (err;
>        return FALSE;
>      }
>    else
>      {
>        /* Process buf */
>        return TRUE;       
>      }
>  }
>
> > > * g_io_unix_get_flags() doesn't look close to right for
> > >   anything but files.
> > 
> > Why? fcntl() should be fine for sockets
>
> I think my concern was the use of fstat() and checking the
> access bits and trying to use those to guess whether the
> channel was readable or writeable.
>
> Or perhaps it was the older issue of using fstat() on sockets,
> though I think that should be OK, certainyl if we check for errors.
>  
> > > * Should lazily open read and write converters? (Error 
> > >   reporting a bit of a problem - you could just always open
> > >   _one_ (say the read converter)
> > 
> > 1. As long as we've got to open one, why not both?
>
> Efficiency. Opening a converter can be quite expensive for
> some encodings, and there could well be separate forward and 
> reverse mapping tables for encoding <=> unicode.
>
> Perhaps we can use the flags to detect whether the stream is 
> readable / writeable before opening converters. I think
> almost all uses of encoding conversions will be for files,
> and the channel will be unidirectional.
>  
> > 2. Does conversion guarantee reverse conversion for non-GNU iconv?
>
> No, there is no guarantee (even for GNU iconv, in theory). I guess
> it's probably better to open both. On consideration, opening
> just the reader, and then opening the writer later, even if
> we only needed the writer is rather to begin with is dubious.
>
> Opening the converters lazily would be best, but isn't possible
> because of error reporting.
>
> So, I think the best thing to do would be to see if we can
> open just the converters we need from the start.
>  
> > > * There is no reason (IMO) to expand the write buffer arbitrarily
> > >   when writing characters - just flush out each chunk as 
> > >   it converted. 
> > 
> > Writing before conversion is complete would cause problems in dealing
> > with error conditions. Currently, if an error is detected in
> > writing, the write buffer is returned to its previous condition
> > and bytes_written is set to zero. If we start writing before we're
> > sure there aren't any conversion errors, we can't do this.
>
> I think this is fine. We see the same behavior with out of space
> errors - we write the part that fits, we get a short write,
> we try again, we get a ENOSPACE error for the remainder,
> and return that as a GError.
>
> If we wanted to preserve the "error returned only if nothing
> written" aspect of write(), we'd have to make sure we only
> ever did one write() per call of g_io_channel_write_chars()
> which is neither easy to do nor desirable.

So, should I then return the number of characters actually written
in *bytes_written even on error?

> > 
> > We might try setting it on NULL and raw encoding (because they
> > should be binary safe), and clearing it on all others (because
> > they're text). Do some encodings use \r and \n bytes inside
> > other characters?
>
> Not-setting O_BINARY is a disaster for UTF-16 ... \n is representted
> as \n\0, and changing that into \r\n\0 corrupts the entire stream.
>
> Anyways, I think making this depend on the encoding would be a
> mistake. It's confusing. Not to mention the fact, that I don't think
> we can actual _change_ it for a file once we have it open.
>
> > This would also require making the default encoding "UTF-8"
> > (instead of NULL) to preserve backward compatibility with
> > the old functions, which didn't (I believe) set O_BINARY.
>
> Remember, the old functions had no way of opening a file as 
> a channel, so whether this was set was up to the app.
>
> The lazy thing to do would be simply to make the new_file() function
> accept the "b" flag and leave it up to the app. Or perhaps
> we could make the default "b" and have a "t" flag, though that
> would be quit non-standard.
>
> I'd really like to get input from the Win32 people on this,
> but haven't managed to do so yet.

This question is marked as bug #57695

> > > Things I haven't really fully reviewed yet
> > > ==========================================
> > > 
> > > * Interaction of all the code with non-blocking streams
> > > 
> > > * Interaction of the code with IO watches - what would 
> > >   you do for instance, if you wanted to set up 
> > >   a calllback to be called on each incoming line 
> > >   as it arrived.
> > > 
> > > * Handling recovery of some sort on encoding failure
> > >   in the middle of a text file.
> > 
> > Two other things:
> > 
> > When trying to flush a write buffer containing partial chars,
> > the new error condition G_IO_CHANNEL_PCHAR_FLUSH is set.
> > I didn't use G_CONVERT_ERROR_PARTIAL_INPUT for this case,
> > since all the read functions use it for partial chars on a read.
> > The read functions can also return G_IO_CHANNEL_PCHAR_FLUSH
> > when calls to read and write functions are mixed.
>
> Well, the number of error conditions is really not that important
> since I suspect programmers will distinguish between them only
> in rare conditions. The important thing is clear text
> in error->message.
>
> I do consider "PCHAR_FLUSH" to be an dubious abbrevation.
> (The general rule is that abbreviations should be avoided unless
> they are obvious. G_IO_CHANNEL_PARTIAL_CHAR_FLUSH is certainly
> looong, but how often would someone need to type it?
>
> However, do we need this error?
>
> When should the user be informed that a "partial character
> flush occurred". What would you do if that appeared in a log
> file or on the screen?
>
> I think there are three possibilities:
>
>  * It's a symptom of a mis-encoded internal data. This is essentially
>    a programming bug. Internal streams should not be misencoded.
>
>  * It's a programming error. g_io_channel_flush() was called
>    at some point when it "shouldn't" be called.
>
>  * It's not an error at all. The programmer simply wanted
>    to encourage data to the stream but didn't care if
>    it all went or not. (After all, on a non-blocking stream
>    flush() won't flush everything either.)
>
> If there is a possibility that it isn't a bug, I think we
> should just keep quiet.
>
> For mixed reads/writes:
>
>  * If it's a socket, we should probably just keep quiet.
>    The reason for flush() before read() is to make sure
>    that any complete output is sent to the other side
>    before we wait for input.

If it's a socket, we don't call flush() from read()
(since channel->is_seekable == FALSE)

>  * If it's a file, it's most likely a programming error
>    and/or a misencoded internal stream, 
>
> I don't see any disaster in not reporting this error, and
> it seems to me that making this an error could occasionally
> be quite annoying.

Okay. I'll change it to a g_warning(), and clear the
contents of the partial character buffer on flush.

This is now bug #57694.

> > What is the channel_flags field in GIOChannel used for?
> > The only place I see it being use is in g_io_channel_init(),
> > where it's set to zero.
>
> My guess is that it was there originally so things like whether the
> channel was blocking or not could be checked efficiently. But
> yes it looks like dead code now. We can always add it back
> if we need it.

I'll clobber it then.

Ron
Follow-Ups:
- Re: New GIOChannel patch
  - From: Owen Taylor
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]