Re: Proposal for bookmarks/history database

From: Adam Hooper <adamh densi com>
To: Peter Harvey <pah06 uow edu au>
Cc: Epiphany List <epiphany-list gnome org>
Subject: Re: Proposal for bookmarks/history database
Date: Wed, 16 Nov 2005 19:52:36 -0500
On Thu, 2005-11-17 at 08:20 +1100, Peter Harvey wrote:
> On Wed, 2005-11-16 at 14:41 -0500, Adam Hooper wrote:
> > On Wed, 2005-11-16 at 09:43 +1100, Peter Harvey wrote:
> > > Just remember, every 'node' of data is given a unique id. All these
> > > tables do is record different types of information on that node. There
> > > is *no* global list of all nodes.
> > 
> > This makes the database horrible un-relational.
> 
> Actually I was trying to represent 'nodes' in a graph structure, so I'm
> not surprised. :) I'm going to attempt to provide good reasons for the
> design I've given and see if I can win you over. If not, I'll
> redesign. :)

A general comment: I don't see why the indisputably most confusing part
of the EphyNode API should get translated to a system which already
works better without it.

> >  It is standard database
> > practice for the same type of data to get sequential IDs. This leads to
> > two cases:

> > Other cases lead to confusion, so they shouldn't be used unless there is
> > a very good reason to do so.
> 
> The reason was that, internally, we do want to treat all these data
> types the same. We do want, for example, a single GtkTreeView class that
> can be used to view a query of different datatypes, receive signals from
> different data types, etc. It will need a consistent signalling
> mechanism, etc.

Why? Why not use a completely different GtkTreeModel for Bookmarks than
for History? Or for any other GtkTreeView we ever cook up? My assumption
is that each GtkTreeView will be quite different from others.

> I expect to write code which receives signals for changes to any
> bookmarks which are children of a particular topic. This is used for
> efficiently regenerating menus, etc.

I bow to your knowledge of efficiently regenerating menus, of course.

> I also expect other nodes to have children. One thing that makes life
> easy is having all bookmarks children of a particular node. That way I
> can listen to the "children of that node" and get changes for all
> bookmarks. Similar for topics.
> 
> I *don't* want to have to implement separate signal mechanisms for
> different node types. I don't want:
>   ephy_db_connect_signal_topic_bmks (int topic, callback, userdata);
>   ephy_db_connect_signal_site_bmks (int site, callback, userdata);
>   ephy_db_connect_signal_bmks (callback, userdata);
> 
> I want:
>   ephy_db_connect_signal_children (int parent, callback, userdata);

(I probably misunderstand to some extent, please correct me after
reading the following:)

I think in general you are contradicting yourself: you want to treat
bookmarks and history items the same, but they have different data
fields (that is, history visits may have "visit time", while bookmarks
may have "title").

So you're treating every object as "node", but at the same time you're
saying: "*This* type of parent has *this* type of children". I interpret
your explanation to mean that if you send a GCallback which was written
for Bookmarks to ephy_db_connect_signal_children(), it will not be able
to understand if History children are passed to it.

So you end up with a function being called with the same name but doing
vastly different things, depending on an integer parameter in a
nondeterministic way.

I don't see why you'd want to implement signals at the EphyDb level,
anyway. "Because EphyNode did it" doesn't count. If it were me writing
EphyDb, I'd make full use of the database abstraction layer: I'd just
make EphyDb a thin wrapper around SQLite. I never liked EphyDb, and I
think copying its design means going from simple (SQLite) -> complicated
(EphyDb) -> simple (bookmarks), instead of just skipping the middle step
entirely.

> > > # data of high-level nodes that are used to organise the other nodes
> > > table 'globals'
> > >   int id
> > >   string name
> > 
> > This table will never get used, and it's not needed.
> 
> Well, it's used if I want a consistent signal mechanism like above. It's
> also used if extensions want to add their own node types which may make
> use of the same tables.

Extensions filling in core tables? That's a scary thought: with an
abstract database design such as this one, there will be many, many
mistakes.

I don't see how you can keep EphyDb generic if you're putting history
and bookmarks in different tables: how will extensions use it? And I
don't see how you can keep EphyDb generic if you put history and
bookmarks in the *same* table: how will they represent their disparate
data fields?

> Just going to make a random comment here:
> --------------------------------------------------
> Note that the only reason I kept 'urls' and 'bookmarks' separate was
> because one is static while the other is editable. 'urls' is like a
> cache of existing known urls, and 'bookmarks' is a set of modifiable
> urls. You never do update on 'urls', but would frequently do it on
> 'bookmarks'. You would regularly search on 'urls' for something to
> attach to, but wouldn't do the same in 'bookmarks'. That's the only
> reason why they're separate. Perhaps I should call one 'static_urls' or
> similar.
> --------------------------------------------------

Okay, but that table isn't necessary. Each url will occur only once, so
you're adding another table to join to, with no benefit.

> Unfortunately we need to record multiple visits for websites, and chpe
> wants to avoid storing the same information hundreds of times.

This is standard aggregation:

table 'history'
  int history_id
  string url
  string title
  string icon

table 'history_visit'
  int history_id
  date visit_date
  int referer_history_id # links to 'history' -- why not?

CREATE VIEW history_last_visit AS
SELECT history.url AS url, history.title AS title,
       history.icon AS icon, MAX(history_visit.visit_date) AS visit_date
FROM history
NATURAL JOIN history_visit
ORDER BY visit_date DESC

> The id column exists only so that we can treat all things as little
> "bundles of information" or "nodes" or "objects". I want to give us
> great flexibility later, for example:
>  - linking to history items by site
>  - associating history items with topic

Yes, but you're taking away flexibility in the process (more on this at
the bottom).

> > I'd rename those "id" columns to "bookmark_id" and "topic_id" to make
> > their relations with "table_bookmark" absolutely clear. SQLite then lets
> > you use "NATURAL JOIN", a cute bit of syntactic sugar.
> 
> Hehe, which is precisely why I don't want it. :D That child relation
> isn't always going to be about bookmarks and topics. :) And having
> multiple child relations around is not pleasant, though could be argued
> for.

Why not use a separate table per relation, like everybody else in the
world? I think database consistency reasons alone provide more than
ample justification. As for "not pleasant": it gives an entirely
self-documenting database, which is perfect for extension writers.

> Some people have previously asked for "topics being related to other
> topics". I'd like to be able to have that.

table 'topic_topic':
  int parent_id
  int child_id

> I could also envision a topic being a bookmark (appears on toolbar as a
> bookmark with a drop-down next to it).

I think that's a brilliant idea. But it would be possible either way.

> > To sum up: in my experience, the above database design is slightly
> > easier to understand and to code for. It also removes pretty much every
> > unnecessary concept translated from "EphyNode" (database IDs being the
> > main one) -- and in my opinion, that is a good thing!
> 
> The thing I'm concerned about is that it might be easy to code for, but
> I'm not sure that it's easy to abstract into a small C interface and
> stay extensible.
> 
> I think you can see that I've designed the database around the code
> because that's where I'll be spending most of my time. From that
> perspective, is what I've done OK?
> 
> Disregarding coding reasons, does it still seem a horrible design?
> Remember that we might want to do crazy things in future like "a topic
> which is a bookmark" or similar.

My concluding comments:

You're trying to make things extensible, but every time you do, things
get more confusing. And the features you end up with are of questionable
value. For instance:

- You can find, from any node ID, the type (e.g., 'history', 'bookmark')
of that node. But it's impossible to actually *retrieve* that node
without knowing its type in the first place.
- You can make anything a child of anything else. But-- oh, come on, do
I even have to suggest how this could lead to problems?
- You can connect signals to any node's children. But what is the
signature of the callback supposed to be? The (logical) answer is *not*
found in the EphyDb API: it's found in the implementation of the caller
object. Where is the only entry point to code which can emit that
signal? The caller.
- You can add new data types from an extension. But you'll have to code
a whole bunch of stuff to circumvent the EphyDb API, because maybe what
you really want to do is LEFT JOIN the 'bookmarks' table to your new
one, and the EphyDb API doesn't support LEFT JOINs.

What I see you doing is creating a replacement to EphyNodeDb, and in
that sense it's probably quite a bit better (I wouldn't know for sure).
I wouldn't be surprised if your coding all this would speed up bookmarks
and history searches considerably, while lowering the memory footprint.

You're taking a very flexible system (SQLite) and specializing it to the
point where you've got enough functions to work for history and for
bookmarks. Then you're converting the bookmarks and history code to use
EphyDb. They will both interact with EphyDb in rather different ways,
and so EphyDb is difficult to write. (Admittedly this half of the
conversion shouldn't be hard to write at all, since the EphyDb API is
not unfamiliar.)

Extensions simply *cannot* use this system. Besides the potential
problems associated with storing all relations in a single table (one
line of broken 'bookmarks' code loses all your 'history'), extensions
would need an API to create tables and an API to run queries. These
already exist, in the form of SQL: it has been heavily researched, and
there is no reason to rewrite it. Especially since a lot of potential
extension writers already know how to use it!

I suggest you take one of two possible paths:

1. Make EphyDb a thin wrapper to SQLite3, and move the signals into
History and Bookmarks, or
2. Make EphyDb work great for history and bookmarks *only*, and ignore
extensions entirely.

I hope it doesn't sound like I'm totally blasting your idea. I, for one,
have tried to create more generic data structures on top of SQL all the
time. It often works okay; the most generic example I've seen is Storage
in GNOME CVS. Possibly the concept of writing extensions with a
beautiful history/bookmarks database to query is just so tempting I
can't bear to let it go ;).

-- 
Adam Hooper <adamh densi com>
Attachment: signature.asc
Description: This is a digitally signed message part
Follow-Ups:
- Re: Proposal for bookmarks/history database
  - From: Peter Harvey
References:
- Proposal for bookmarks/history database
  - From: Peter Harvey
- Re: Proposal for bookmarks/history database
  - From: Adam Hooper
- Re: Proposal for bookmarks/history database
  - From: Peter Harvey
- Re: Proposal for bookmarks/history database
  - From: Peter Harvey
- Re: Proposal for bookmarks/history database
  - From: Adam Hooper
- Re: Proposal for bookmarks/history database
  - From: Peter Harvey
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]