Re: Terminology concerning strings



Hi Egmont,

On Mon, 2005-04-04 at 14:36, Koblinger Egmont wrote:
> On Mon, Apr 04, 2005 at 11:35:44AM +0200, Roland Illig wrote:
> 
> > * the _size_ of a string (as well as for other objects) is the number of
> >   bytes that is allocated for it. For arrays, it is the number of
> >   entries of the array. For strings it is at least _length_ + 1.
> > 
> > * the _length_ of a string is the number of characters in it, excluding
> >   the terminating '\0'.

> It seems to me that this terminology is not yet multibyte-aware. Since UTF-8
> becomes an everyday issue and AFAIR is planned for mainstream mc 4.7.0, IMHO
> it is very important to create a clear terminology for this even if it's not
> yet officially implemented now.

It seems you haven't read Roland's post very well. He clearly
differentiates between size (raw number of bytes) and length (number of
characters represented on the screen). From discussions with him I know
he writes this post explicitly with multibyte charsets in mind. "ecs" in
ecssup.{c,h} stands for "extended charset".

Or am I missing your point?

Leonard.

-- 
mount -t life -o ro /dev/dna /genetic/research





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]