Re: [evince] possible wrong error by evince

From: Frédéric Bron <frederic bron m4x org>
To: evince-list <evince-list gnome org>
Subject: Re: [evince] possible wrong error by evince
Date: Fri, 30 Nov 2012 21:18:06 +0100

The bug is fixed in ghostscript (I have not checked):

http://bugs.ghostscript.com/show_bug.cgi?id=693477 says:

Technically the 'correct' approach is to define a PDFDSCEncoding which maps
the non-ASCII values. However, this is non-trivial, and counter-intuitive.

I've made changes so that in the absence of a PDFDSCEncoding we will assume
that
any non UTF-16BE string is using PDFDocEncoding. We then convert that to
UTF-16BE and on to UTF-8.

This should resolve the problem. See commit:
a3d00daf5f9abb1209cb750a95e23bc6951c1c63

Commit log:

pdfwrite - convert non-UTF-16BE doc info to UTF-8 assuming PDFDocEncoding

Bug #693477 "Encoding of pdf metadata do not comply with pdf standard"

When processing Document info there is a pdfwrite parameter
'PDFDSCEncoding' which, if present, is used to process the string into
ASCII. However, if this parameter is not supplied, we don't re-encode
the string at all. Since
the XML must be UTF-8, this is potentially a problem.

Since we cannot know the source of the docinfo string (existing PDF,
DOCINFO pdfmark, or DSC comments in PostScript) we cannot make any
judgement about the encoding of the string data in the absence of
PDFDSCENcoding. So we choose to assume that its encoded using
PDFDocEncoding if it does not have a UTF-16BE BOM (which is the only
other format permitted).

This should at least mean that the Docinfo and XML match and are legal.
No differences expected, the cluster doesn't check the XML

References:
- [evince] possible wrong error by evince
  - From: =?ISO-8859-1?Q?Fr=E9d=E9ric_Bron?=
- Re: [evince] possible wrong error by evince
  - From: jose aliste gmail com
- Re: [evince] possible wrong error by evince
  - From: David Kastrup
- Re: [evince] possible wrong error by evince
  - From: James Cloos
- Re: [evince] possible wrong error by evince
  - From: =?ISO-8859-1?Q?Fr=E9d=E9ric_Bron?=

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]