Re: [Tracker] PATCH: Faster PNG extractor



Hi,

The main reason for not modifying the original extractor is that I want to keep it as a fallback if this new extractor fails due to an unexpected file structure. Since png-faster tries to skip to the end of the file by estimating the location of the metadata contained in the end of the file using the file size & IDAT chunk size, I predict it may fail more often than the original. Since tracker-extract handles these failures gracefully, this is not a problem however.

The best way I can see to get a similar functionality in to the existing extractor would be to modify libpng to allow skipping to the end of the file (right now there is a comment in the existing png extractor noting that this functionality is missing from the library), but since reading the PNG format is relatively simple I opted to put this functionality in the extractor rather than first patching libpng (I am not sure how much work this would be, either).

What are your thoughts on keeping png-faster as a separate, optional extractor module which can be enabled when extraction speed is of primary concern?


On 27 June 2013 19:06, Martyn Russell <martyn lanedo com> wrote:
On 27/06/13 16:08, Jonatan Pålsson wrote:
On 27 June 2013 16:48, Aleksander Morgado <aleksander lanedo com
<mailto:aleksander lanedo com>> wrote:

    On 27/06/13 16:26, Jonatan Pålsson wrote:
     > To start with, I would like to submit a patch containing a new
    extractor
     > for PNG files, which is faster than the original.
     >
     > The reason behind the speed increase with this extractor compared
    to the
     > old extractor is that the new extractor seek()s out the metadata
    fields
     > in the PNG, rather than traverse the entire file to find them, as the
     > old extractor did (using libpng).

    Could you share some numbers on which is the actual speed improvement?
    E.g. extracting 1000 random PNGs before took Xs, now it takes Ys.

Certainly!

I'm running Tracker on a PandaBoard Rev A4. 1000 replicated PNGs were
used, I could make the replicated file available, there is nothing
special about it.
I used the following command to measure the running times:

# For png-faster
tracker-control -r ; echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ;
time /usr/lib/tracker/tracker-miner-fs -v 0 --no-daemon

# For the original PNG extractor
tracker-control -r ; /usr/lib/tracker/tracker-extract -m png
echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ; time
/usr/lib/tracker/tracker-miner-fs -v 0 --no-daemon

And here are the results:
# png-faster
real    0m14.804s
user    0m4.945s
sys     0m1.313s

# original
real    1m33.274s
user    0m5.250s
sys     0m1.820s

That's quite some difference!

Thanks for posting some numbers. Important! :)

My first thought is, why did you create a new extractor instead of improve the original one?

The patch link you gave is good, but I would love to see a diff from our actual extractor right now to see how easily we could merge the changes into that one.

--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.



--
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]