Re: Suggestion for improving search speed?



El mié, 24-06-2009 a las 18:56 +0200, Bartosz Kostrzewa escribió:
> Carlos Garcia Campos wrote:
> >> Is there a specific reason why it can't be launched as an
> >> EV_JOB_RUN_THREAD?
> > 
> > yes, it makes easier (and faster) emit the signal find_update from
> > ev_job_find_run(). And it allows running another thread job at the same
> > time. Note that the callback for find_update will update the UI, so it
> > can't run on a thread other than the main one. 
> 
> I see. I've updated UIs from other threads many times, but maybe that's
> easier to do in gtkmm than in pure gtk. I don't know enough about the
> latter. In GTKMM one commonly uses a dispatcher which enables
> cross-thread signalling, so it is still the main loop doing the updates
> but the data comes from a woker thread. In addition one can use a queue
> of threads to avoid having to created a new thread for each part of the
> task.
> 
> > 
> > Yes, sure, but I'm quite sure that if poppler side is fast enough,
> > current evince code is good enough too. 
> 
> Ok. Well I can see a few potential improvements to the search functions
> in poppler itself.
> 
> 1) a new textoutputdev is created for every page searched, that seems a
> bit wasteful on the surface, but profiling evince while searching for
> "reference" and "document" and then "format" in the PDF reference
> document (980 pages) shows that creation and destruction of
> TextOutputDev account for only about 3% of the load...

what poppler version are you looking at? we don't use TextOutputDev
anymore, we now use a TextPage directly, which is used by CairoOutputDev
during rendering. This was done to avoid having to create a
TextOutputDev for every page, and now, if the page was already rendered
(if you are looking for a word in the current page for instance), the
search is quite fast since all the information is already available.
When the page was rendered for displaying, this is quite good, but when
the page is rendered just for searching we are doing things we don't
really want (rendering images, annotations and so on).

> 2) the most time by far (~87%) is spent in page->page->displaySlice,
> where most of the time is spent createGfx (or Gfx::Gfx() [58%] ) with
> about 20% in Gfx::display() and 5% creating the Annot objects
> 
> I hope I'm right in thinking that this is necessary for finding the
> words on the page. Is it necessary to create this at 72 DPI? It would be
> nifty to be able to know the minimum font size on the page à priori and
> then decide on minimum dpi (if that does in fact have a large impact on
> performance)
> 
> 3) in that step, creating the Annots object takes a lot of time (~ 5% of
> overall total), does find_text search in annotations as well?

no

>  If not,
> that's a bit of a waste.  

yes

> (if it were possible to check for the
> existence of annotations à priori, this would be a bit of a boost I
> suppose as you wouldn't have to create the objects in the first place)

Anyway, please use evince and poppler from git master, because we have
improved search performance in the last versions. 

> Bartek

Thanks!

-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462

Attachment: signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]