Re: [Evolution-hackers] EBookBackendSqliteDB comments



On Fri, 2011-05-06 at 10:01 +0200, sean finney wrote:
> Hi Milan,
> 
> On Fri, May 06, 2011 at 08:56:10AM +0200, Milan Crha wrote:
> > > As I already said seanus on irc, I will be evaluating the performance
> > > between having vcards as files Vs having it in db and then choose the
> > > one which would be best. So the code for both will be there and we can
> > > choose between them over after testing. I was also thinking of providing
> > > it as an option for the backends to choose once i complete the testing..
> > > So what we discussed stays the same :)
http://git.gnome.org/browse/evolution-ews/tree/src/addressbook/e-book-backend-sqlitedb.h
has the API's. Meta-data apis is work-in-progress.

> > 
> > This is not only about performance, my main concerns are these:
> > a) if something fails with db file, user's data are safe
> 
> > b) users can take their contacts anytime and import them on another
> > machine, in case of hard disk crash, partial backup or anything like
> > that
> 
> I think we should stop and consider two different motivations for this
> API.  (1) Local addressbook (2) Local cache of remote addressbook.  For
> case (1), I agree that having the items split out could be useful and
> a good safeguard against any db corruption (though my experience thusfar
> with sqlite is fairly positive).
> 
> For case (2), I would say if there's a problem with the file just nuke
> it and reload it from the remote store.  Since you can guarantee that
> you can get a "working copy" of the info, you can then rely on the existing
> UI (or sqlite, or the remote service, or whatever), for exporting the
> contacts.  It is a *cache* after all :)
> 
> So for something like GAL (or any cached-from-remote addressbooks),
> I think it makes a lot of sense to *not* split out the contacts, at
> least as long as performance doesn't suffer by having more items in
> the sqlitedb file.
I wanted to check the performance on the address-books which has huge
data in them between the two methods and choose the best which suits.
If it turns out that there is a big difference between the two, i would
document that and allow a choice for the backends to choose how they
want to store the data.

> 
> > c) folders.db files tend to grow "indefinitely". That's another point
> > why I do not like "one file per account".
> 
> I'd like to clarify a detail of the API from having looked over it wrt
> evo-mapi: it's designed so that it can be used "one file per account", by
> creating a single db file and specifying the "folder" as an API parameter
> in all calls.
> 
> But this means you could always create multiple db instances at different
> file locations, one per folder, and just use a junk "FOLDER" (or similar)
> name for the folder.  Having looked over the current evo-mapi code, I
> think you'd want to do soemthign like that.
> 
> Of course if you think that there should *never* be a cas where it's used
> one db per account, then rethinking the API would make sense, but otherwise
> nothing lost by keeping it, it gives you a way to do both.
I have made it configurable.  So the clients can choose to save all the
address-books in one db or provide different paths so that they would be
stored in different db files.

> 
> > An example: my evo-mapi account has 4 addressbooks (one is GAL). I would
> > really prefer to have them separated, not in one large file. Not talking
> 
> And that should be possible, see above.
> 
> > about possible (even unlikely) UID clashes between separate
> > addressbooks. Will it also mean that each local addressbook will be
> > stored in one large db? Please do not do that.
> 
> The underlying db should deal with stuff like UID clashes, agreed.  I
> think the current API does so, though I'm not convinced it's the best
> way.  Currently, you have:
> 
> const gchar *stmt = "CREATE TABLE IF NOT EXISTS folders         \
> 					 ( folder_id  TEXT PRIMARY KEY,             \
> 					   folder_name TEXT,                        \
> 					   sync_data TEXT,                          \
> 					   bdata1 TEXT, bdata2 TEXT,                \
> 					   bdata3 TEXT)";
> 
> stmt = sqlite3_mprintf ("CREATE TABLE IF NOT EXISTS %Q                  \
> 		( uid  TEXT PRIMARY KEY,                           \
> 		  nickname TEXT, full_name TEXT,                   \
> 		  given_name TEXT, family_name TEXT,               \
> 		  email_1 TEXT, email_2 TEXT,                      \
> 		  email_3 TEXT, email_4 TEXT,                      \
> 		  vcard TEXT)", folderid);
> 
> which AIUI means a table named after every folder.  Therefore the UID's
> are already internally partitioned and will not conflict.  WRT normalizing
> the database, I would suggest something more like:
> 
> const gchar *stmt = "CREATE TABLE IF NOT EXISTS folders         \
> 					 ( folder_id  TEXT PRIMARY KEY,             \
> 					   folder_name TEXT,                        \
> 					   sync_data TEXT,                          \
> 					   bdata1 TEXT, bdata2 TEXT,                \
> 					   bdata3 TEXT)";
> 
> stmt = sqlite3_mprintf ("CREATE TABLE IF NOT EXISTS contacts                  \
> 		( folder_id INT,
> 		  uid  TEXT,                           \
> 		  nickname TEXT, full_name TEXT,                   \
> 		  given_name TEXT, family_name TEXT,               \
> 		  email_1 TEXT, email_2 TEXT,                      \
> 		  email_3 TEXT, email_4 TEXT,                      \
> 		  vcard TEXT,
> 		  PRIMARY KEY (folder_id, uid) )" );
> 
On address-book deletion, dropping a table is far better than querying
and deleting all the contacts that matches a folder id. But the
frequency of deleting address-book's may be less.

So I went for a quick search and found this,
http://stackoverflow.com/questions/784173/what-are-the-performance-characteristics-of-sqlite-with-very-large-database-files
which shows using mutltiple tables is better. I have not personally done
any tests regarding this.

> As an extra bonus that means you could do autocomplete type
> queries in a single SQL query.
AFAIK with the current design of eds, each address-book would be queried
separately and would not benefit by this.

- Chenthill.
> 
> 
> 
> 	sean
> _______________________________________________
> evolution-hackers mailing list
> evolution-hackers gnome org
> To change your list options or unsubscribe, visit ...
> http://mail.gnome.org/mailman/listinfo/evolution-hackers




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]