Re: [evince] possible wrong error by evince
- From: David Kastrup <dak gnu org>
- To: "jose.aliste\@gmail.com" <jose aliste gmail com>
- Cc: evince-list <evince-list gnome org>
- Subject: Re: [evince] possible wrong error by evince
- Date: Thu, 29 Nov 2012 11:00:37 +0100
"jose aliste gmail com" <jose aliste gmail com> writes:
> Hi,
>
> Thanks for reporting this. This error is on the parse of the metadata.
> I have no time right now to look in deep at it, will try to do next
> week, but the description you give is wrong to my eyes, so another
> thing must be happening. I'll try to explain. One thing is that the
> character "ä" is U+00e4, and another thing is how to code this
> character in UTF-8, where you need two bytes, and the code is c3 a4,
> so if lilypond are trying to code "ä" as a e4, this is not a valid
> UTF-8 code!
Sure, it isn't. But pdfmarks are not encoded in UTF-8. They are
encoded either in PDFDocEncoding (a subset of Latin-1) or in UTF16BE
with byte order mark.
Complain to Adobe about their choice, but as long as that is the way PDF
encodes stuff, Evince can't unilaterally decide for something saner.
> Please note that the code that throws the error is the libxml parser,
> which usually is very strict about encodings and things like that.
The respective part in the PDF looks like
<</Producer(GPL Ghostscript 9.06)
/CreationDate(D:20121128183026+01'00')
/ModDate(D:20121128183026+01'00')
/Creator(LilyPond 2.17.7)
/Author(\344 \366)
/Title(\376\377\003\262)
/Composer(\344 \366)>>endobj
As you can see, there is no XML involved here at all. Note that the PDF
in the original report was generated from an input file accidentally
written in Latin-1 (LilyPond requires UTF-8 input), so all bets are off
with that. However, when correctly encoding the input as UTF-8, at
least the author field will still be cranked out encoded as
Latin-1/PDFDocEncoding, and Evince (in contrast to other viewers and
pdfinfo) will complain with the mentioned XML error. Since it would
appear that Evince generates that XML itself as part of its internal
operations, it seems like it fails to convert PDFDocEncoding to UTF-8 in
the process.
--
David Kastrup
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]