Re: G::O::I based bindings and problem passing char array ref



On Thu, 2 Jun 2016 21:09:51 -0500
Jeremy Volkening <jdv base2bio com> wrote:

This problem is a bit hard to grasp. More detailed tests on another
batch of PDFs gave this:

1. Most (126) failed to open with the "PDF document is damaged" error

2. A few (4) opened but failed to load the first page ( e.g.
get_page(0) returned undef )

3. The rest (5) seemed fine - opened, read metadata, rendered to png

The only obvious common denominators I could find between the five
files that worked was:

a. all were marked as PDF 1.1 compliant (all others at least PDF 1.4).
b. all were a single page
c. none seem to use any internal compression

(b) is not relevant since other single-page files failed. Given that
five files worked fine, the problem appears not in the argument type
but in the way Perl is passing the data to the poppler libs. All 135
files passed all tests using the "new_from_file()" constructor
(letting poppler do the reads). (c) might be relevant since passing
those files through ghostscript with all default settings introduces
compressed objects and the files then fail testing. I strongly
suspect that Perl is mangling the binary data blobs somehow before
they arrive at the C libs but am clueless as to how to track this
down.

I see that the data argument to "new_from_data()" is specified as a utf-8 char
array, but the contents of the PDF files (those that use compressed blocks)
are not actually straight UTF-8. Is it possible that perl-G:I:O sees the specs
and is forcing UTF-8 encoding on the data at some point?

Jeremy



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]