Re: Finding and Reminding, tech issues, 3.0 and beyond

On Fri, 2010-04-09 at 18:09 -0400, Owen Taylor wrote:
> I've attempted below to extract out some of the technical bits from

This is great stuff.  I feel kind of bad commenting from the sidelines,
given that I have done practically no work on gnome-activity-journal or
the Zeitgeist engine, but anyway, here goes.

Basically, the FindingAndReminding page has come to conclusions that are
very similar to what gnome-activity-journal tries to be.  In this mail I
want to convince you that g-a-j plus Zeitgeist are the right way to go
for GNOME, and that reimplementing them would be a waste of resources.

> "Things can safely fall off the desktop"

The timeline leads to this naturally, so we are on the same page.

The discussion about keeping the Zeitgeist database as part of Tracker's
own database is very interesting.  I don't know enough details about
either, so I will leave that discussion to the Zeitgeist hackers.
However, it is evident that we need a time-based store of stuff that the
user has accessed --- Zeitgeist provides exactly this.
> "User defined tags"
>   A completely flat view of all documents doesn't handle all users
>   or use cases. "Frequent filers" will want to be able to identify
>   projects and other subsets of files.

Yeah, we need to show tags as something more than "this file has such
and such tags" and "this is a search query for tags".

When I was thinking about the Journal two GUADECs ago, I imagined that
you could be able to "circle" a bunch of files and tag them visually.
Imagine taking a red pencil, circling a few events in the journal, and
giving them a purple tag that says "Research Paper on Monkeys".  The
purple tag appears somewhere prominent ("current projects"?  "recent
tags"?) so you can access those items again easily without scrolling
back in the timeline.

Firefox's "recent tags" command is incredibly useful- this is more or
less the same.
> "Timeline view of files"

Clearly we agree on this :)

One thing I like about the mockup in the FindingAndReminding page is
that the items are laid out in a grid.  Imagine this:

  [       ]
  [ thumb ] File with a long descriptive name.odt
  [       ]

  [thumb] [thumb] [thumb] [...]   (imported 234 photos)

  [       ]
  [ thumb ] Another file with a meaningful name.pdf
  [       ]

I.e. an irregular grid.  You don't want to show meaningless names like
"dsc02345.jpg" for photos, but you do want document titles or
descriptive filenames when they are available.  You may want bigger
thumbnails of PDFs than of photos so that they are easier to recognize.
>   (Note that the timeline here only includes each item once,
>   not once for each usage - I use "timeline" somewhat differently
>   below)

>From using gnome-activity-journal, it becomes clear that you need both
options - show only the latest version, and show all instances of each
item or file.

Showing only the latest version is for when you are churning through
things during the day, and you don't want 15 copies of the same file in
today's view, every time you saved.

Showing all instances is useful when reviewing items in the past, so
that you are reminded of the context.  You may only want to show a
single instance of a file that was edited more than one time in each
day, though.
> "Search"
>   We want to be able to search - over the names of all
>   documents, but also over extracted metadata such as
>   document titles, and maybe over full text. This is definitely
>   best supported by something like Tracker.

Yes, you need search.  But even just searching over document titles or
filenames is useful.

The best version of gnome-activity-journal was the one that had
Control-S working like in Emacs (or was it Control-F?).  Searching is
instantaneous and matches get highlighted in real time.

> "Adding non-files to Desktop"

Good.  Zeitgeist supports this, and is one of the more interesting

I'd add these to the timeline:

- Web pages, or those that you bookmark.
- Mails that I sent.
- Mails that I received that have attachments.
- Attachments that I saved.
- IM conversations that I had.
- Handwritten notes ("Tomboy in the journal")
- Files or items that I can drop into future dates.
- Git pushes.
- With help from Greasemonkey or whatever, google docs that I edited.

>  * RDF + SPARQL + a large collection of ontologies does present
>    a significant new barrier to someone coming to the GNOME

Hopefully Tracker will have as many readers for common file formats as
we need.  For more specific things, it shouldn't be a problem to add
little patches to apps so that they store extra metadata.  Evolution may
want to store a has-attachment triplet (or however the Nepomuk ontology
represents that), for example.
>  * There is a large abstraction barrier between the application
>    and the underlying data storage. It's very hard to decipher
>    or influence how storing data in RDF and running SPARQL queries
>    maps into low-level database operations.

That requires profiling with real apps and real data.

One problem that both Zeitgeist and Tracker have run into, through no
fault of their own, is that they have provided very interesting APIs
that have no callers yet.  So we don't know how chunky the results need
to be for optimal performance, and things like that.

Now that the graphical parts are being designed, we can have a better
idea of what APIs should be provided by Zeitgeist and Tracker - and that
should give us an idea of how to profile this stuff.
> * Using Tracker to extract and index metadata from files is
>    pretty uncontroversial. Using Tracker as the primary store
>    of information (such as tags) is more controversial - suddenly
>    the user's data is dependent on the use of Tracker.

Both Tracker and Zeitgeist need to be pretty much 100% reliable.  It's
like f-spot - if it loses your database, you are Royally Screwed(tm).
You may not have lost your pictures, but re-tagging them is a herculean
> My understanding is that the Tracker people have disclaimed
> the log storage problem. The role of log-storage for projects
> like "GNOME Activity Journal" is taken over by the Zeitgeist
> daemon.

Again on the same page :)
>    The only think I can think of in the current mockups
>    that requires a Zeitgeist-like approach is the
>    "Frequent" selector. Without a longitudinal view
>    of usage, it's hard to answer "what are the most frequently 
>    used documents in the last 30 days".

One really interesting aspect of Zeitgeist is the data-mining algorithms
it uses to present "items that are frequently used together" - the
Apriori algorithm and such.  I wouldn't want to lose this.  One thing
that would make the timeline really useful would be to split the screen
in two, with the timeline on one side and the related items in the other
>  * To a much greater extent than tracker, Zeitgeist is
>    is designed to require applications to be modified to
>    push events to it.

Yes and no.  Apps that create files should already be storing that info
in ~/.recently-used, which Zeitgeist feeds from.

The Zeitgeist API is mostly for when you want to log non-file items.
Mails that you got with attachments, web bookmarks, etc.

If we are going to have deep integration throughout the desktop and
apps, we *are* going to have to modify apps.  I don't think that's a
problem, especially now that D-Bus APIs let us have soft dependencies
>  * Zeitgeist is designed to be standalone and independent
>    from Tracker, but also used in conjunction. This, at
>    times, makes things not as good as they could be. For
>    example, Tracker has a pretty sophisticated system to
>    assign a UID to each file and track files as they
>    move around the file system, but Zeitgeist, which
>    identifies file by file paths will lose a file as
>    soon as it is moved - it doesn't piggyback off the
>    work that Tracker is doing.

Hmm, couldn't Zeitgeist be modified to support this?  Although the
Zeitgeist team has worked hard on finalizing the database format and
APIs, nothing released "in the wild" yet, so changes are feasible. 

> So for explicit file manipulation of files (cleaning up,
> filing, etc) the user would probably still be using Nautilus.

Don't worry about Nautilus too much for now.  I'd rather leave it as it
is and adapt it gradually, and/or move towards the model of Sebastian
Faubel's semantic file manager.

> Not much yet - I think it will definitely be hard to implement
> our ideas without something that looks a lot like Tracker, and 
> since we have Tracker something that looks a lot like Tracker 
> is most likely Tracker :-) Zeitgeist seems less centrally crucial, 
> but there is a role for event logging here. 

I think implementing your own timeline storage is a waste of time.
Zeitgeist needs to be productized, and part of that is to answer the
question of whether it needs to be integrated with Tracker.

Now, about the graphical part...

Gnome-activity-journal has substantial work put into it.  The user
interface is not perfect, of course, but it's there and it works.

Gnome-shell is going to have to accomodate the journal somehow.
Rewriting a timeline interface and all the associated widgetry, by hand,
using gnome-shell's not-really-a-widget-system approach, sounds painful.
However, gnome-shell already includes a window manager - couldn't it run
gnome-activity-journal as a separate process and just manage its window
in a special way?  (This is the kind of extensibility in the window
manager that I was advocating two GUADECs ago - remember the tabs that
pull out of the edges of the screen?)

One idea that floated around was the following.  Gnome-shell has a big
"Activities" button that shows you all the stuff that you have open.  It
could very well have a "Journal" button next to that one, that would
show you the journal in a large part of the screen --- it is All Your
Work, so it deserves sufficient screen real estate.

The summary of all of this is:

- Don't reinvent Zeitgeist.  Make it better; solve the issues with

- Don't reinvent the journal.  Make it work with gnome-shell.

- Thanks for the detailed analysis of the literature.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]