Re: [Tracker] Script for media library generation

From: Jonatan Pålsson <jonatan palsson pelagicore com>
To: Martyn Russell <martyn lanedo com>
Cc: "tracker-list gnome org" <tracker-list gnome org>
Subject: Re: [Tracker] Script for media library generation
Date: Fri, 2 Aug 2013 08:37:18 +0200

On 1 August 2013 17:34, Martyn Russell <martyn lanedo com> wrote:

On 01/08/13 15:23, Jonatan Pålsson wrote:


Hi list,



Hello Jonatan,


Hi Martyn!

OK. So I think I have pitched the problem now. What I have done is to
combine media encoders (LAME and ImageMagick) and metadata tagging
software (id3v2 and exiftool) with the random number generator of
Python. By using the random numbers generated by Python as input to
these tools, random, reproducible (by reusing the seed for the PRNG),
media files can be created.

I'm using this to create large numbers of media files to test Tracker
extractor modules on, and it works pretty well. So far I can generate
MP3, PNG, JPG, TIF, and GIF.



Just before you go on, what are you trying to test here? That we
index/extract properly? Or test the data with queries to the database?


The main purpose for generating actual files rather than directly
putting the data in the database is to test the actual extractors.
Specifically I am working on lowering the extraction speed as much as
possible. I have previously written to the list asking for tips on how
to improve store insertion performance, and this is also something I
am experimenting with, via the extractors. I am also, as you mention,
looking at accuracy in the extractors, since it is quite simple to
compare for instance ID3 tags of the MP3 files with the extracted
fields.

My general idea is that having actual files to index produces a
testing scenario as close as possible to actual use cases of Tracker.


At this point I wanted to ask if you had seen the data generators we have in
the Tracker tree already? NOTE: I say "data" not "file" there.

  utils/data-generators/cc/

You can run

  $ ./generate ./default.cfg

It will create a bunch of ttl files which you can import as you which using
tracker-import. I think you can even use tracker-import *.ttl.

Anyway this is fake data, not based on files - so it really depends on what
you're testing. You can also tweak where the data draws its random crap from
:)


This is news to me, thanks for pointing it out. These scripts should
be good for measuring optimal insertion speed to the database, not
including time required for actual extraction. I am currently
experimenting with developing much simpler ontologies, holding only
the data we (Pelagicore) are interested in - I will have a look at if
it is interesting to adapt this ttl generator for our ontologies in
order to establish an optimal insertion speed. The source data you are
using (source-data/) looks very interesting. I might have a look at
integrating that into my script.

I think it would be quite useful to include your file generator into the
tracker tree for people to make use of or at least reference to it from a
README somewhere.


This sounds like a good plan. I will ping the list when I feel the
script is ready for inclusion (or mentioning in a README), I have some
more features and fixes to add soon.

-- 
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden

Follow-Ups:
- Re: [Tracker] Script for media library generation
  - From: Ivan Frade

References:
- [Tracker] Script for media library generation
  - From: Jonatan Pålsson
- Re: [Tracker] Script for media library generation
  - From: Martyn Russell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]