This problem is a bit hard to grasp. More detailed tests on another batch of PDFs gave this: 1. Most (126) failed to open with the "PDF document is damaged" error 2. A few (4) opened but failed to load the first page ( e.g. get_page(0) returned undef ) 3. The rest (5) seemed fine - opened, read metadata, rendered to png The only obvious common denominators I could find between the five files that worked was: a. all were marked as PDF 1.1 compliant (all others at least PDF 1.4). b. all were a single page c. none seem to use any internal compression (b) is not relevant since other single-page files failed. Given that five files worked fine, the problem appears not in the argument type but in the way Perl is passing the data to the poppler libs. All 135 files passed all tests using the "new_from_file()" constructor (letting poppler do the reads). (c) might be relevant since passing those files through ghostscript with all default settings introduces compressed objects and the files then fail testing. I strongly suspect that Perl is mangling the binary data blobs somehow before they arrive at the C libs but am clueless as to how to track this down. I'm attaching two files in case anyone else is willing to pursue this or confirm my findings - "okay.pdf" that passes all tests, and "nogood.pdf" which is the same file passed through ghostscript with "gs -sDEVICE=pdfwrite -o nogood.pdf okay.pdf" (fails tests). Jeremy
Attachment:
nogood.pdf
Description: Adobe PDF document
Attachment:
okay.pdf
Description: Adobe PDF document