Searching for text in PDF files is wrong



Hi.

I use poppler_page_find_text() to find text in PDF files. This returns
GList of pointers to PopplerRectangles. Then I use
poppler_page_render_selection() to mark the found text.

What is wrong is that PopplerRectangles returned by
poppler_page_find_text() are incompatible with those that
poppler_page_render_selection() requests, which is why the wrong text
is selected.

I have found that to make those two compatible, I have to do the
following to PopplerRectangles returned by poppler_page_find_text():
1) SWAP(rectangle.x1, rectangle.x2);
2) SWAP(rectangle.y1, rectangle.y2);
3) rectangle.y1 = page_height - rectangle.y1;
4) rectangle.y2 = page_height - rectangle.y2;
But this does not solve the problem because the marked text cycles
between right and wrong again while resizing the window.

I have created a small program that illustrates the problem. Here it
is: https://pastebin.com/h3F56Yv7 (I've also sent an attachment but
last time you didn't get it so this paste is a fallback in case you
don't get it again.)
You ought to supply two arguments when running the program: the
absolute path to a PDF file and the text you want to search for,
respectively. Pay attention to the selected text with and without
lines 54-57.

How can I make the found text to be marked properly? This "workaround"
does not work very well and it is an ugly solution anyway.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]