Re: Proposal for bookmarks/history database



Adam Hooper wrote:

So using SQLite doesn't make memory usage any worse. In fact, since it's
disk-based, we're essentially determining an upper bound on memory usage
at compile-time (not counting GTK's memory usage). For large databases,
memory usage would be better than with EphyNode. (The break-even point
might be at several thousand nodes, depending on our choice of cache
size, though.)
Yes, my hope was that SQLite would have lower memory footprint than EphyNode. Especially if we just happen to let users have a 3mo browser history or some such crazy thing. :)

You want 'sites' and 'topics' to share a primary key: this means they
could share the same table, which would provide more extensibility.

table 'groupings'
 int id            # primary key
 short int group_type_id # type of grouping: enum for 'sites'/'topics'
 int name          # topic or site name
I'd prefer to keep them separate, else the topic "yahoo.com" and the site "yahoo.com" will become blurred.

If we could make group_type_id more abstract, we might even be able to
let extensions define arbitrary group types. Assigning IDs could be
tricky, though (just as it is now with EphyNode DB IDs). This may not be
necessary in a preliminary API, but it's a problem we've already had
with EphyNode, so it's something to keep in mind.
I would agree to a 'groupings' table, but as follows.

table 'grouping'
 int id
 int name
with example content of
 (1, 'bookmarks')
 (2, 'history')
 (3, 'topics')
 (4, 'searches')
 (5, 'joe bob's extension stuff')

I should just explain a little further:
Something is made into a 'topic' in Epiphany by having a certain node (in the above case, node 3) as a parent.
 Something has 'topic data' by having an entry in the topics table.
Similarly, something is made into a 'bookmark' by having a certain node (node 1) as a parent.
 It has 'bookmark data' by having those fields filled in the pages table.

Note that saying a node is a 'bookmark' doesn't really mean much. It doesn't mean it's restricted or given a type or anything. It just means "it has a certain parent (so we can find all bookmarks quickly) and it has some data associated with it (in a special table for such things)."

Somebody mentioned metadata. Maybe we should take a page out of
Storage's book? There is an effective design pattern in Storage's source
code on GNOME CVS, but I'd definitely leave that out of any initial
implementation. (It would be added in separate tables keyed by the same
ID as in 'pages'.) The ephy_db_query_* API would need to be modified
(maybe using varargs, like g_object_get()) if we want to go down that
route.
Yeah, the idea is "for a single node id, you can have different sets of data for it". One example set of data is the "page" information (ie. url, title, date visited, etc). You can however add your own tables and data for a given node.


I'm not sure I like the "pages.b*" fields (they're mostly NULL, which is
icky), but I can't think of a better method right now.
The better method is to have an explicit 'bookmarks' table, which would be:

table 'bookmarks'
 string url
 string title
 string icon

IMHO this is actually a better route.

1. You're missing ORDER BY -- one of the key advantages of using SQLite
is that it can help us order stuff very quickly.
I'd like to just add custom ephy_db_query_* functions rather than have an ugly general-purpose call. For example, when querying history it makes much more sense to restrict by date, but that's a very specific type of query. What I listed were the basic ones with few fancy features. :)

2. By my understanding, that DISTINCT is superfluous. Am I missing
something?
I thought it wasn't superfluous when I wrote it, but I have no idea why I wrote it now....

SQLite already does caching. We could save a lot of work by taking
advantage of it. All we have to do is specify a cache size. (This would
work extremely well with the structure above, but maybe BLOB metadata
would be a problem....)
All I was thinking of was "cache the data from the most recent node involved in a get/set operation". Consider what happens when you modify a bookmark. Every browser window gets a signal and reacts by querying ephy_db_get (...) for that info. I'd prefer that we didn't need to call upon SQLite's more advanced cache for something as trivial as this. That's just how small a cache I'm suggesting. :)

"SELECT parent_id FROM groupings WHERE child_id = <child>" -- this will
be *blazingly* fast, considering every modification of a child requires
user interaction.
I was concerned that the groupings table could be very large. But, now that I think about it, a table of 10000 nodes isn't *that* big.

Big thanks for the feedback. Will outline a slightly revised table layout shortly.
Peter.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]