Re: [Tracker] Text extraction on text formats
- From: Luca Ferretti <elle uca libero it>
- To: Laurent Aguerreche <laurent aguerreche free fr>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] Text extraction on text formats
- Date: Fri, 17 Nov 2006 16:07:15 +0100
Il giorno ven, 17/11/2006 alle 15.04 +0100, Laurent Aguerreche ha
scritto:
Le vendredi 17 novembre 2006 Ã 15:00 +0100, Luca Ferretti a Ãcrit :
Il giorno gio, 16/11/2006 alle 22.53 +0100, Laurent Aguerreche ha
scritto:
-----------------
Questo ?? un semplice esempio delle potenzialit?? di OO.o
^^ it was "Ã" ^^ it was Ã
A question: what is encoding of this string? UTF8, ISO-something,
Win-something, etc.? Now, I can see that with libGSF and a RTF file:
Dunno. I simply wrote in OO.o (Italian locale). It should be UTF-8.
But are you asking about the original ODT or the exported RTF?
Both... But I saw the same problem on some of my DOC files with
different encodings.
ODT:
content.xml -> encoding="UTF-8"
meta.xml -> encoding="UTF-8"
RTF:
gedit says it's UTF-8 (any command line tool to check it? iconv is
used to change encoding, not to check)
Now, two questions:
1) if the RTF file is encoded ad UTF-8, why escaping non ASCII? Is it an
OO.o bug?
2) Let's assume that escaping is better for portability, the "Ã"
character is escaped with "?" (line 11: "altro non ? che"), "\u232
\'3f" (line 11: "altro non \u232\'3f che") and finally "\uc2 \u232\'c3
\'a8\uc1" (line 15: "Questo \uc2 \u232\'c3\'a8\uc1 un semplice") X-|
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]