On Sun, 2011-07-10 at 15:20 +0200, Jesse van den Kieboom wrote: > Just FYI, you need to probably be a bit careful with this approach and > take care of proper utf-8 handling (i.e. the difference between > character offsets and byte offsets etc.). Yes, good point. I have made a change to make sure Python knows it's dealing with a unicode string when matching the regexp: https://github.com/jonleighton/gedit-trailsave/commit/84b965fd02379a93a68d35cbc20a784ec6fe7e31 This means that the offsets provided by python are now based on characters rather than bytes, which seems to work. Is the string returned by self.doc.get_text() always going to be utf-8 though? Or do I need to be able to deal with other encodings too? Thanks. -- http://jonathanleighton.com/
Attachment:
signature.asc
Description: This is a digitally signed message part