Re: Suggestion for improving search speed?

From: Bartosz Kostrzewa <zoombat runbox com>
To: evince-list gnome org
Subject: Re: Suggestion for improving search speed?
Date: Mon, 22 Jun 2009 18:11:27 +0200

Lalith Suresh wrote:
> I'm not familiar with evince' design, so I'd like some feedback on whether
> this is a good idea or not? Perhaps then I can start digging through the
> source (with some help from you all of course!).

Given that evince is already highly multi-threaded it should not be too
difficult to do what you want.

As far as I understand the source the search is launched in ev-window.c
in the find_bar_search_changed_cb callback by calling ev_job_find_new
and passing this new "job" object to the scheduler (ev-job-scheduler.c)
which then starts the worker thread (ev_job_find_run code in ev-jobs.c),
this then calls the document text search which in turn calls the
interface (backend) specific full-text search. (such as
pdf_document_find_find_text in ev-poppler.c)

When a page has been searched the GList* of matches (I believe it's a
bunch of rectangles) is saved for the given page and the find_update
signal is emitted which makes evince jump to the result page and
highlight the matching word. At the same time the current page to be
searched is incremented and TRUE is returned to the scheduler (unless
the job is finished, in which case FALSE is returned) which (I believe)
tells the scheduler that a new find_job_run is to be scheduled
(ev_job_thread in ev-job-scheduler.c).

Since find_job_new accepts a start and an end page, you could probably
simply schedule C jobs with appropriate ranges. But this would probably
mess up the behaviour of the callback which manages the view when a
result is reported. So I suppose this is out.

Alternatively (and this would probably work better given the current
design) you could launch C instances of ev_document_find_text_text (for
current_page, current_page+1 ... current_page+(C-1) ) in ev_find_job_run
(probably using the scheduler itself) and wait there for the results,
apply them to the relevant pages, and increment current_page by c
(dealing with edge cases, where current_page+c is outside the document
for instance, this would also imply that you'd have to handle these
cases when launching the threads)

I don't know whether you'd have to implement locking of the
EvDocumentFind struct which is passed by pointer to
ev_document_find_find_text or whether you could simply make C copies of
that pointer in ev_find_job_run and not worry about locking. (at least
for the poppler backend this struct is not touched in
pdf_document_find_find_text)

Please keep in mind that I haven't spent any time actually analyzing
these parts of the evince codebase, it would be good if a real developer
could confirm that what I have said is indeed what's going on.

-Bartek

Follow-Ups:
- Re: Suggestion for improving search speed?
  - From: Carlos Garcia Campos

References:
- Suggestion for improving search speed?
  - From: Lalith Suresh

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]