Re: g_utf8_offset_to_pointer() optimization
- From: Behdad Esfahbod <behdad cs toronto edu>
- To: Luis Menina <liberforce fr st>
- Cc: performance-list gnome org, Federico Mena Quintero <federico novell com>
- Subject: Re: g_utf8_offset_to_pointer() optimization
- Date: Thu, 3 Nov 2005 10:53:29 -0500 (EST)
On Wed, 2 Nov 2005, Luis Menina wrote:
> Ok, I've checked my code and I think you're wrong:
> As I pre-increment the pointer, the first byte is never checked (I
> assume I'm not in the middle of a character). So I'm waiting in this
> case (offset == 1) for the first byte that doesn't match the "10xx xxxx"
> pattern... wich is the case of the null byte !
>
> Offset is then decremented, and everything goes smoothly...
You are right, your code is correct. Although it requires
NULL-termination. BTW, your code performs almost three times
slower than original code for Korean, which makes sense. I'm
measuring all implementations posted here and on planet. Will
post soon.
behdad
> BTW I've tried to use Federico's pango benchmark tools (
> http://primates.ximian.com/~federico/news-2005-10.html#25 ), but i'm
> left with an error...
>
> After the "import cairo" error (solved by installing pycairo) I have
> this error that I can't resolve, as i'm no python guru:
>
> ================
>
> python ./plot-languages.py -o chart.png test1.xml
> Traceback (most recent call last):
> File "./plot-languages.py", line 373, in ?
> main ()
> File "./plot-languages.py", line 367, in main
> rset = ResultSet (file)
> File "./plot-languages.py", line 47, in __init__
> self.parse (filename)
> File "./plot-languages.py", line 63, in parse
> self.parse_language_node (l)
> File "./plot-languages.py", line 78, in parse_language_node
> time = float_from_node (child)
> File "./plot-languages.py", line 32, in float_from_node
> return float (c.nodeValue)
> ValueError: invalid literal for float(): 11,560000
>
> =================
>
> Thanks to anyone that can help me...
>
>
> Behdad Esfahbod a écrit :
> > On Wed, 2 Nov 2005, Luis Menina wrote:
> >
> >
> >>Can you give me more info about what is wrong with my function ?
> >>I don't understand what you mean by "it doesn't pass over the tail of
> >>the last characters"
> >
> >
> > Your code fails if the last character to skipped is a multibyte
> > one. Suppose this is the input:
> >
> > str = "\xC2\xA0"
> > offset = 1
> >
> > which is the U+00A0 NO-BREAK SPACE. The output should be str +
> > 2, but your code returns str + 1.
> >
> > behdad
> >
> >
> >
> >>>>gchar * g_utf8_offset_to_pointer1 ( const gchar *str,
> >>>> glong offset)
> >>>>{
> >>>> while (offset)
> >>>> {
> >>>> if ((*++str & 0xC0) != 0x80)
> >>>> --offset ;
> >>>> }
> >>>>
> >>>> return (gchar *)str;
> >>>>}
> >
> >
> > --behdad
> > http://behdad.org/
> >
> > "Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
> > -- Dan Bern, "New American Language"
> >
>
>
--behdad
http://behdad.org/
"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
-- Dan Bern, "New American Language"
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]