Re: [Tracker] Text extraction on text formats



Il giorno ven, 17/11/2006 alle 15.04 +0100, Laurent Aguerreche ha
scritto:
Le vendredi 17 novembre 2006 Ã 15:00 +0100, Luca Ferretti a Ãcrit :
Il giorno gio, 16/11/2006 alle 22.53 +0100, Laurent Aguerreche ha
scritto:
-----------------
Questo ?? un semplice esempio delle potenzialit?? di OO.o

         ^^ it was "Ã"                          ^^ it was Ã


A question: what is encoding of this string? UTF8, ISO-something,
Win-something, etc.? Now, I can see that with libGSF and a RTF file:

Dunno. I simply wrote in OO.o (Italian locale). It should be UTF-8. 

But are you asking about the original ODT or the exported RTF?

Both... But I saw the same problem on some of my DOC files with
different encodings.


ODT:
  content.xml -> encoding="UTF-8"
  meta.xml    -> encoding="UTF-8"

RTF:
  gedit says it's UTF-8 (any command line tool to check it? iconv is
  used to change encoding, not to check)

Now, two questions:
1) if the RTF file is encoded ad UTF-8, why escaping non ASCII? Is it an
OO.o bug?

2) Let's assume that escaping is better for portability, the "Ã"
character is escaped with "?" (line 11: "altro non ? che"), "\u232
\'3f" (line 11: "altro non \u232\'3f che") and finally "\uc2 \u232\'c3
\'a8\uc1" (line 15: "Questo \uc2 \u232\'c3\'a8\uc1  un semplice")   X-|






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]