Re: Suggestion for improving search speed?



El lun, 22-06-2009 a las 18:11 +0200, Bartosz Kostrzewa escribió:
> Lalith Suresh wrote:
> > I'm not familiar with evince' design, so I'd like some feedback on whether
> > this is a good idea or not? Perhaps then I can start digging through the
> > source (with some help from you all of course!).
> 
> Given that evince is already highly multi-threaded it should not be too
> difficult to do what you want.
> 
> As far as I understand the source the search is launched in ev-window.c
> in the find_bar_search_changed_cb callback by calling ev_job_find_new
> and passing this new "job" object to the scheduler (ev-job-scheduler.c)
> which then starts the worker thread (ev_job_find_run code in ev-jobs.c),
> this then calls the document text search which in turn calls the
> interface (backend) specific full-text search. (such as
> pdf_document_find_find_text in ev-poppler.c)

EvJobFind doesn't use the worker thread, it's a EV_JOB_RUN_MAIN_LOOP
which means that it runs in the main thread. 

> When a page has been searched the GList* of matches (I believe it's a
> bunch of rectangles) is saved for the given page and the find_update
> signal is emitted which makes evince jump to the result page and
> highlight the matching word. At the same time the current page to be
> searched is incremented and TRUE is returned to the scheduler (unless
> the job is finished, in which case FALSE is returned) which (I believe)
> tells the scheduler that a new find_job_run is to be scheduled
> (ev_job_thread in ev-job-scheduler.c).

Exactly. 

> Since find_job_new accepts a start and an end page, you could probably
> simply schedule C jobs with appropriate ranges. But this would probably
> mess up the behaviour of the callback which manages the view when a
> result is reported. So I suppose this is out.
> 
> Alternatively (and this would probably work better given the current
> design) you could launch C instances of ev_document_find_text_text (for
> current_page, current_page+1 ... current_page+(C-1) ) in ev_find_job_run
> (probably using the scheduler itself) and wait there for the results,
> apply them to the relevant pages, and increment current_page by c
> (dealing with edge cases, where current_page+c is outside the document
> for instance, this would also imply that you'd have to handle these
> cases when launching the threads)
> 
> I don't know whether you'd have to implement locking of the
> EvDocumentFind struct which is passed by pointer to
> ev_document_find_find_text or whether you could simply make C copies of
> that pointer in ev_find_job_run and not worry about locking. (at least
> for the poppler backend this struct is not touched in
> pdf_document_find_find_text)

ev_find_job_run is run by the main loop, so it should be fast since we
don't want to block the UI. 

> Please keep in mind that I haven't spent any time actually analyzing
> these parts of the evince codebase, it would be good if a real developer
> could confirm that what I have said is indeed what's going on.

hmm, I think it'd probably better trying to improve the poppler code
first. If you need help with the poppler code, feel free to ask in
poppler mailing list (or irc #poppler in freenode)

> -Bartek


-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462

Attachment: signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]