Re: Proposal for bookmarks/history database



On Wed, 2005-11-16 at 14:41 -0500, Adam Hooper wrote:
> On Wed, 2005-11-16 at 09:43 +1100, Peter Harvey wrote:
> > Just remember, every 'node' of data is given a unique id. All these
> > tables do is record different types of information on that node. There
> > is *no* global list of all nodes.
> 
> This makes the database horrible un-relational.

Actually I was trying to represent 'nodes' in a graph structure, so I'm
not surprised. :) I'm going to attempt to provide good reasons for the
design I've given and see if I can win you over. If not, I'll
redesign. :)

>  It is standard database
> practice for the same type of data to get sequential IDs. This leads to
> two cases:
> 
> a. "bookmark" and "history item" are the same data type, in which case
> they appear in the same table and have the same properties.
> b. "bookmark" and "history item" are different data types, in which case
> they each get their own set of IDs and their own tables.
> 
> Other cases lead to confusion, so they shouldn't be used unless there is
> a very good reason to do so.

The reason was that, internally, we do want to treat all these data
types the same. We do want, for example, a single GtkTreeView class that
can be used to view a query of different datatypes, receive signals from
different data types, etc. It will need a consistent signalling
mechanism, etc.

This probably doesn't directly address your concern above, but it
will. :)

> > # parent-child relationships between nodes
> > table 'children'
> >   int id
> >   int child
> 
> I was about to write a big, long description of how this table could be
> fixed to work work for case "b." above, when I realized that it could
> simply be removed entirely. Its only use is for topics, so it can be
> replaced by:
> 
> table 'topic_bookmarks':
>   int topic_id
>   int bookmark_id

Again, we want to treat all datatypes the same.

I expect to write code which receives signals for changes to any
bookmarks which are children of a particular topic. This is used for
efficiently regenerating menus, etc.

I also expect other nodes to have children. One thing that makes life
easy is having all bookmarks children of a particular node. That way I
can listen to the "children of that node" and get changes for all
bookmarks. Similar for topics.

I *don't* want to have to implement separate signal mechanisms for
different node types. I don't want:
  ephy_db_connect_signal_topic_bmks (int topic, callback, userdata);
  ephy_db_connect_signal_site_bmks (int site, callback, userdata);
  ephy_db_connect_signal_bmks (callback, userdata);

I want:
  ephy_db_connect_signal_children (int parent, callback, userdata);

> > # data of high-level nodes that are used to organise the other nodes
> > table 'globals'
> >   int id
> >   string name
> 
> This table will never get used, and it's not needed.

Well, it's used if I want a consistent signal mechanism like above. It's
also used if extensions want to add their own node types which may make
use of the same tables.

Just going to make a random comment here:
--------------------------------------------------
Note that the only reason I kept 'urls' and 'bookmarks' separate was
because one is static while the other is editable. 'urls' is like a
cache of existing known urls, and 'bookmarks' is a set of modifiable
urls. You never do update on 'urls', but would frequently do it on
'bookmarks'. You would regularly search on 'urls' for something to
attach to, but wouldn't do the same in 'bookmarks'. That's the only
reason why they're separate. Perhaps I should call one 'static_urls' or
similar.
--------------------------------------------------

> I imagine the first table could be removed and the second could be
> renamed to "history"? Also, you could remove the "id" column. Like so:
> 
> table 'history'
>   string url
>   string title
>   string icon
>   date last_visit

Unfortunately we need to record multiple visits for websites, and chpe
wants to avoid storing the same information hundreds of times. So we
have a 'urls' table which is the url,title,icon triple and refer to it
for each visit (so long as the url, title and icon are the same).

The id column exists only so that we can treat all things as little
"bundles of information" or "nodes" or "objects". I want to give us
great flexibility later, for example:
 - linking to history items by site
 - associating history items with topic

These are just some ideas but the extensibility is why I used an OO-like
design rather than a standard relational one.

> > # data for bookmarks
> > table 'bookmarks'
> >   int id
> >   string url
> >   string title
> >   string icon
> 
> > # data related to user-specified topics
> > table 'topics'
> >   int id
> >   string name
> 
> I'd rename those "id" columns to "bookmark_id" and "topic_id" to make
> their relations with "table_bookmark" absolutely clear. SQLite then lets
> you use "NATURAL JOIN", a cute bit of syntactic sugar.

Hehe, which is precisely why I don't want it. :D That child relation
isn't always going to be about bookmarks and topics. :) And having
multiple child relations around is not pleasant, though could be argued
for.

Some people have previously asked for "topics being related to other
topics". I'd like to be able to have that.

I could also envision a topic being a bookmark (appears on toolbar as a
bookmark with a drop-down next to it). For example:
+--------------+
| Bugzilla | v |
+--------------+
 |             +------------+
 | #319040 : blah blah .... |
 | #319041 : blah blah .... |
 | #313040 : blah blah .... |
 | #311040 : blah blah .... |
 | #314040 : blah blah .... |
 | #313040 : blah blah .... |
 +--------------------------+

I'm going to argue about the UI, but want to keep our options open. :)

> > Q. Why not just have one table per attribute? Name, icon, url, etc?
> > A. I'm not a database expert, but I would *assume* that would make 
> >    queries horribly slow as you'd need to correlate data from multiple
> >    tables simultaneously. I've tried to ensure we correlate no more
> >    than three tables at once for most things.
> 
> Correct. That would be impractical design to make tons of tables.
> 
> To sum up: in my experience, the above database design is slightly
> easier to understand and to code for. It also removes pretty much every
> unnecessary concept translated from "EphyNode" (database IDs being the
> main one) -- and in my opinion, that is a good thing!

The thing I'm concerned about is that it might be easy to code for, but
I'm not sure that it's easy to abstract into a small C interface and
stay extensible.

I think you can see that I've designed the database around the code
because that's where I'll be spending most of my time. From that
perspective, is what I've done OK?

Disregarding coding reasons, does it still seem a horrible design?
Remember that we might want to do crazy things in future like "a topic
which is a bookmark" or similar.

Big thanks for this,
Peter.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]