Re: [gedit-list] Plugin development question



2011/7/10 Jon Leighton <j jonathanleighton com>:
> On Sun, 2011-07-10 at 15:20 +0200, Jesse van den Kieboom wrote:
>> Just FYI, you need to probably be a bit careful with this approach and
>> take care of proper utf-8 handling (i.e. the difference between
>> character offsets and byte offsets etc.).
>
> Yes, good point. I have made a change to make sure Python knows it's
> dealing with a unicode string when matching the regexp:
>
> https://github.com/jonleighton/gedit-trailsave/commit/84b965fd02379a93a68d35cbc20a784ec6fe7e31
>
> This means that the offsets provided by python are now based on
> characters rather than bytes, which seems to work.
>
> Is the string returned by self.doc.get_text() always going to be utf-8
> though? Or do I need to be able to deal with other encodings too?

It is guaranteed to always be utf-8, so no need to do any conversions.

>
> Thanks.
>
> --
> http://jonathanleighton.com/
>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]