Re: g_utf8_offset_to_pointer() optimization



On Wed, 2 Nov 2005, Luis Menina wrote:

> Ok, I've checked my code and I think you're wrong:
> As I pre-increment the pointer, the first byte is never checked (I
> assume I'm not in the middle of a character). So I'm waiting in this
> case (offset == 1) for the first byte that doesn't match the "10xx xxxx"
> pattern... wich is the case of the null byte !
>
> Offset is then decremented, and everything goes smoothly...

You are right, your code is correct.  Although it requires
NULL-termination.  BTW, your code performs almost three times
slower than original code for Korean, which makes sense.  I'm
measuring all implementations posted here and on planet.  Will
post soon.

behdad


> BTW I've tried to use Federico's pango benchmark tools (
> http://primates.ximian.com/~federico/news-2005-10.html#25 ), but i'm
> left with an error...
>
> After the "import cairo" error (solved by installing pycairo) I have
> this error that I can't resolve, as i'm no python guru:
>
> ================
>
> python ./plot-languages.py -o chart.png test1.xml
> Traceback (most recent call last):
>    File "./plot-languages.py", line 373, in ?
>      main ()
>    File "./plot-languages.py", line 367, in main
>      rset = ResultSet (file)
>    File "./plot-languages.py", line 47, in __init__
>      self.parse (filename)
>    File "./plot-languages.py", line 63, in parse
>      self.parse_language_node (l)
>    File "./plot-languages.py", line 78, in parse_language_node
>      time = float_from_node (child)
>    File "./plot-languages.py", line 32, in float_from_node
>      return float (c.nodeValue)
> ValueError: invalid literal for float(): 11,560000
>
> =================
>
> Thanks to anyone that can help me...
>
>
> Behdad Esfahbod a écrit :
> > On Wed, 2 Nov 2005, Luis Menina wrote:
> >
> >
> >>Can you give me more info about what is wrong with my function ?
> >>I don't understand what you mean by "it doesn't pass over the tail of
> >>the last characters"
> >
> >
> > Your code fails if the last character to skipped is a multibyte
> > one.  Suppose this is the input:
> >
> >   str = "\xC2\xA0"
> >   offset = 1
> >
> > which is the U+00A0 NO-BREAK SPACE.  The output should be str +
> > 2, but your code returns str + 1.
> >
> > behdad
> >
> >
> >
> >>>>gchar * g_utf8_offset_to_pointer1 (	const gchar *str,
> >>>>					glong offset)
> >>>>{
> >>>>	while (offset)
> >>>>	{
> >>>>		if ((*++str & 0xC0) != 0x80)
> >>>>			--offset ;
> >>>>	}
> >>>>
> >>>>	return (gchar *)str;
> >>>>}
> >
> >
> > --behdad
> > http://behdad.org/
> >
> > "Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
> > 	-- Dan Bern, "New American Language"
> >
>
>

--behdad
http://behdad.org/

"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
	-- Dan Bern, "New American Language"



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]