Re: Proposal for bookmarks/history database



On Wed, 2005-11-16 at 19:52 -0500, Adam Hooper wrote:
> A general comment: I don't see why the indisputably most confusing part
> of the EphyNode API should get translated to a system which already
> works better without it.

It allows us to have generic, named sets of nodes. The 'set' idea is
imho necessary for nice user interaction, as people tend to think in
sets. Bookmarks arranged by topics, their history arranged by browsing
session, their history arranged by site, etc.


> Why? Why not use a completely different GtkTreeModel for Bookmarks than
> for History? Or for any other GtkTreeView we ever cook up? My assumption
> is that each GtkTreeView will be quite different from others.

At the moment we have one tree model used extensively in Epiphany
(ephy-node-tree-model). The only problem with it is that it can be truly
horrible to use (perhaps I don't quite understand it). Something like
the SQLite backed tree model here seems much cleaner and easier: 
  http://galeon.sourceforge.net/sqlite-store/


> > I also expect other nodes to have children. One thing that makes life
> > easy is having all bookmarks children of a particular node. That way I
> > can listen to the "children of that node" and get changes for all
> > bookmarks. Similar for topics.
> > 
> > I *don't* want to have to implement separate signal mechanisms for
> > different node types. I don't want:
> >   ephy_db_connect_signal_topic_bmks (int topic, callback, userdata);
> >   ephy_db_connect_signal_site_bmks (int site, callback, userdata);
> >   ephy_db_connect_signal_bmks (callback, userdata);
> > 
> > I want:
> >   ephy_db_connect_signal_children (int parent, callback, userdata);
> 
> (I probably misunderstand to some extent, please correct me after
> reading the following:)
> 
> I think in general you are contradicting yourself: you want to treat
> bookmarks and history items the same, but they have different data
> fields (that is, history visits may have "visit time", while bookmarks
> may have "title").
> 
> So you're treating every object as "node", but at the same time you're
> saying: "*This* type of parent has *this* type of children". I interpret
> your explanation to mean that if you send a GCallback which was written
> for Bookmarks to ephy_db_connect_signal_children(), it will not be able
> to understand if History children are passed to it.
> 
> So you end up with a function being called with the same name but doing
> vastly different things, depending on an integer parameter in a
> nondeterministic way.

I'm not so concerned about writing "universal signal handlers that can
handle a node regardless of what information it has associated with it".
I'm more concerned about writing ephy_db_connect_signal (and presumably
ephy_db_emit) and keeping it clean and generic.


> I don't see why you'd want to implement signals at the EphyDb level,
> anyway. "Because EphyNode did it" doesn't count. If it were me writing
> EphyDb, I'd make full use of the database abstraction layer: I'd just
> make EphyDb a thin wrapper around SQLite. I never liked EphyDb, and I
> think copying its design means going from simple (SQLite) -> complicated
> (EphyDb) -> simple (bookmarks), instead of just skipping the middle step
> entirely.

We need signals because databases don't push data. With the creation of
ephy_db I hope to *delete* ephy_bookmarks and anything else that stores
EphyNode data.


> > Just going to make a random comment here:
> > --------------------------------------------------
> > Note that the only reason I kept 'urls' and 'bookmarks' separate was
> > because one is static while the other is editable. 'urls' is like a
> > cache of existing known urls, and 'bookmarks' is a set of modifiable
> > urls. You never do update on 'urls', but would frequently do it on
> > 'bookmarks'. You would regularly search on 'urls' for something to
> > attach to, but wouldn't do the same in 'bookmarks'. That's the only
> > reason why they're separate. Perhaps I should call one 'static_urls' or
> > similar.
> > --------------------------------------------------
> 
> Okay, but that table isn't necessary. Each url will occur only once, so
> you're adding another table to join to, with no benefit.

The specific problem was this:
1. Add bookmark for 'http://www.gnome.org' with default title, which
   creates an entry in the urls table.
2. Visit http://www.gnome.org, so that a visit record links to your
   newly created entry in the urls table.
3. Edit the bookmark, changing a record the 'visits' table links to.

This will result in 'rewriting history'.


> This is standard aggregation:
> 
> table 'history'
>   int history_id
>   string url
>   string title
>   string icon
> 
> table 'history_visit'
>   int history_id
>   date visit_date
>   int referer_history_id # links to 'history' -- why not?
> 
> CREATE VIEW history_last_visit AS
> SELECT history.url AS url, history.title AS title,
>        history.icon AS icon, MAX(history_visit.visit_date) AS visit_date
> FROM history
> NATURAL JOIN history_visit
> ORDER BY visit_date DESC

Correct me if I'm wrong, but that seems exactly what I had. :) Except I
had 'urls' instead of 'history' (they're just urls, they're not history
until you visit them).


> My concluding comments:
> 
> You're trying to make things extensible, but every time you do, things
> get more confusing. And the features you end up with are of questionable
> value. For instance:
> 
> - You can find, from any node ID, the type (e.g., 'history', 'bookmark')
> of that node. But it's impossible to actually *retrieve* that node
> without knowing its type in the first place.
> - You can make anything a child of anything else. But-- oh, come on, do
> I even have to suggest how this could lead to problems?
> - You can connect signals to any node's children. But what is the
> signature of the callback supposed to be? The (logical) answer is *not*
> found in the EphyDb API: it's found in the implementation of the caller
> object. Where is the only entry point to code which can emit that
> signal? The caller.
> - You can add new data types from an extension. But you'll have to code
> a whole bunch of stuff to circumvent the EphyDb API, because maybe what
> you really want to do is LEFT JOIN the 'bookmarks' table to your new
> one, and the EphyDb API doesn't support LEFT JOINs.

I should explain my thinking a bit more. My initial vision wasn't to use
SQLite. It was just to replace the custom EphyNode structures with
instances of g_hash_table. In my mind, a bookmark or history record or a
topic is just a hashtable like:

  MY NODE:
    "url" => "http://www.gnome.org/";
    "title" => "GNOME"
    "icon" => NULL

I wanted it to be a generic hashtable because I wanted to be able to
store arbitrary information in these nodes. For example:

  MY NODE:
    "url" => "http://www.abc.net.au/";
    "title" => "ABC Television"
    "icon" => NULL
    "parents" => 1034,4012,3212
    "created" => 17th Nov, 2005
    "thumbnail" => NULL

Then the idea of using SQLite was introduced (others were aleady working
on it). I just thought "what's the most efficient way to represent these
little bundles of information in a database?". Using the hashtable
concept means:

a. You *don't* know that the type of the node is if just given a number.
   You get given a number for a purpose. You query it for a purpose. If
   it has the information you're looking for, great! If it doesn't, your
   code was probably wrong to begin with.

b. The signature of the "data changed" callback is something like:
      void callback (int node, int data_changed[], int length, userdata)
   We're only listening for changes. Each field is given a number (I 
   was going to use GQuarks if I use g_hash_tables. Probably still can).

c. The signature of a query callback is something like:
      void callback (int node, char *data[], int length, userdata)
   which is basically just SQLites query callback.

d. An extension would just have to register a new 'set of fields' which
   we would then turn into a table. It may also register a 'global' node
   if it so desires so that sets of nodes it creates can be easily 
   found. This isn't required though. A 'bookmark thumbnail browser'
   extension might just add a 'thumbnail' fields and then set those 
   fields for nodes which are children of the 'bookmarks' node in the 
   globals table.


> Extensions simply *cannot* use this system. Besides the potential
> problems associated with storing all relations in a single table (one
> line of broken 'bookmarks' code loses all your 'history'), extensions
> would need an API to create tables and an API to run queries. These
> already exist, in the form of SQL: it has been heavily researched, and
> there is no reason to rewrite it. Especially since a lot of potential
> extension writers already know how to use it!

Most extensions would need only basic queries I believe. For those that
require specialist queries we can most likely create a simple interface
for them that hides all the SQL, or just give them the SQLite handle and
let them break every release.


> I suggest you take one of two possible paths:
> 
> 1. Make EphyDb a thin wrapper to SQLite3, and move the signals into
> History and Bookmarks, or

I don't want to do that. I want to delete History and Bookmarks classes
to reduce the amount of code. And I don't want to expose extensions to
SQLite unnecessarily. It is possible that we incorporate some other
database for bookmarks (eg. linking to Firefox's bookmarks file) and so
an abstract "database of simple little hashtables" works nicely.


> 2. Make EphyDb work great for history and bookmarks *only*, and ignore
> extensions entirely.

I want extensions to add meta-information to history and bookmarks
items. And if I allow that, I might as well allow all sorts of things in
there.


> I hope it doesn't sound like I'm totally blasting your idea. I, for one,
> have tried to create more generic data structures on top of SQL all the
> time. It often works okay; the most generic example I've seen is Storage
> in GNOME CVS. Possibly the concept of writing extensions with a
> beautiful history/bookmarks database to query is just so tempting I
> can't bear to let it go ;).

I appreciate the blasting. :) Just worried that I'm being too stubborn.
I will admit that if we go down the route of completely separate
"topics" tables and "bookmarks" tables and "bookmark_topics" tables and
the like, I *don't* want to be the one to write it. :)

Thanks again,
Peter.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]