Re: Suggestion for improving search speed?



Carlos Garcia Campos wrote:
>> Is there a specific reason why it can't be launched as an
>> EV_JOB_RUN_THREAD?
> 
> yes, it makes easier (and faster) emit the signal find_update from
> ev_job_find_run(). And it allows running another thread job at the same
> time. Note that the callback for find_update will update the UI, so it
> can't run on a thread other than the main one. 

I see. I've updated UIs from other threads many times, but maybe that's
easier to do in gtkmm than in pure gtk. I don't know enough about the
latter. In GTKMM one commonly uses a dispatcher which enables
cross-thread signalling, so it is still the main loop doing the updates
but the data comes from a woker thread. In addition one can use a queue
of threads to avoid having to created a new thread for each part of the
task.

> 
> Yes, sure, but I'm quite sure that if poppler side is fast enough,
> current evince code is good enough too. 

Ok. Well I can see a few potential improvements to the search functions
in poppler itself.

1) a new textoutputdev is created for every page searched, that seems a
bit wasteful on the surface, but profiling evince while searching for
"reference" and "document" and then "format" in the PDF reference
document (980 pages) shows that creation and destruction of
TextOutputDev account for only about 3% of the load...

2) the most time by far (~87%) is spent in page->page->displaySlice,
where most of the time is spent createGfx (or Gfx::Gfx() [58%] ) with
about 20% in Gfx::display() and 5% creating the Annot objects

I hope I'm right in thinking that this is necessary for finding the
words on the page. Is it necessary to create this at 72 DPI? It would be
nifty to be able to know the minimum font size on the page à priori and
then decide on minimum dpi (if that does in fact have a large impact on
performance)

3) in that step, creating the Annots object takes a lot of time (~ 5% of
overall total), does find_text search in annotations as well? If not,
that's a bit of a waste.  (if it were possible to check for the
existence of annotations à priori, this would be a bit of a boost I
suppose as you wouldn't have to create the objects in the first place)

Bartek


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]