Re: Finding and Reminding, tech issues, 3.0 and beyond

On 14/04/10 16:10, Alexander Larsson wrote:
On Fri, 2010-04-09 at 18:09 -0400, Owen Taylor wrote:

"User defined tags"

   A completely flat view of all documents doesn't handle all users
   or use cases. "Frequent filers" will want to be able to identify
   projects and other subsets of files.

   There's not a detailed plan for the user interface right now, but
   technically this could be done a couple of ways.

   We could use the traditional method of grouping by using
   folders; and just make that look somewhat tag-like in the
   UI. (Make selecting a folder show all the files in that folder
   and all sub-folders. Allow creating a folder of files without
   worrying where it was and automatically creating it in

   Or we could use a real tag-based approach with tags stored in
   metadata. (multiple tags per file, tags orthogonal to folders.)

Does tracker currently index gvfs/gio metadata?


Thats a highly efficient
way to set small "non-extracted" metadata on files that will
automatically be copied/moved/etc when files are managed with nautilus
or other gio apis.

That sounds quite similar to what Tracker does, perhaps not as efficient of course (due to infrastructure/IPC/etc).

What about files which are not copied/moved/etc with GIO APIs/commands?


  * Using Tracker to extract and index metadata from files is
    pretty uncontroversial. Using Tracker as the primary store
    of information (such as tags) is more controversial - suddenly
    the user's data is dependent on the use of Tracker.

I'm personally of the opinion that we should use a separate store for
such metadata, and then index this with tracker. Which is why i created
the gvfs metadata storage:

It has certainly been suggested that reproducible metadata (from files) should be considered in another database to the one keeping tags and more user-specific data. I think we agree there that it would be preferable, however, it isn't done that way right now.

Just as an update on the current state of Tracker regarding that post:

- "It uses a database, so each read operation is an IPC call that gets resolved to a database query, causing performance issues", this is true to some extent. We have been looking at improving this. In addition there is a bug about a direct access library which Bastien filed. We are considering it¹.

- "I don’t like the mixing of user specified data like custom icons with auto-extracted or generated data...", we agree generally. Tags are a bit of a unique situation at the moment (there might be other cases too).

- "The tracker database is a huge (gigabytes) complex database with information from lots of sources, mostly autogenerated." This might be true for 0.6, but 0.8 is much more efficient in terms of space. For my collection (consisting of 174223 resources, 11450 images, 17750 audio files, 151052 files, 20536 folders) the meta database is 344542208 bytes. That's quite a bit smaller than it used to be.

- "This risks the data not being backed up", we now have a journal which backs up the data quite efficiently too. This replays all transactions and even ontology changes.

- "Also, people having problems with tracker are prone to remove the databases and reindexing just to see if that “fixes it...", the only people that come to us with those problems are extreme use and corner cases or where testers have done something which warrants it. Generally we don't get many people needing that with 0.8+.

- "...or due to database format changes on upgrades.", we have limited support for this in 0.8 (depends on what changes in the ontology). For 0.9 we have pretty much completed that now.

- "Also, the generic database model seems like overkill for the simple stuff we want to store, like icon positions and spatial window geometry.", it is no longer generic, which is a major contributing factor to why 0.8 is so fast.

- "For instance, many people report that system performance when using Tracker suffer. I’m sure this is fixable...", It was always fixable with slower indexing, but that's usually not good enough. It is always a trade off. This is not the case so much these days. At least on my desktop I don't even notice when it is running. Others have said the same thing.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]