[gnome-db] Re: libgda vs. gnucash [was Re: GnuCash page on GO site]



On Thu, Mar 04, 2004 at 04:32:39PM +0100, Rodrigo Moya was heard to remark:
> On Thu, 2004-03-04 at 09:24 -0600, Linas Vepstas wrote:
> 
> > On Wed, Mar 03, 2004 at 05:35:32PM +0100, Rodrigo Moya was heard to remark:
> > > On Wed, 2004-03-03 at 10:24 -0600, Linas Vepstas wrote:
> > > 
> > > >  [gnucash has]
> > > > -- an object query system. libgda does not.
> > > > -- a uuid system. libgda does not.
> > > > -- an object persistance infrastructure. libgda does not. 
> > > > -- a multi-user object caching and cache-coherence system. libgda does not.
> > > > -- a data-set partitioning system. libgda does not.
> > > 
> > > We are making progress though. since now we know what you need from
> > > libgda. 
> > 
> > Are you sure you want to implement these features? It wasn't clear
> > to me if these were within or outside of your project scope.
> > 
> it depends, some might make sense in libgda, others, like all data
> dictionary management stuff might make sense in libmergeant (part of
> mergeant), where a lot of code for that purpose is already available.
> Others might go directly to libgnomedb (as I said, most things in
> libmergenat will probably be moved to libgnomedb sooner or later)

I couldn't find the libmergeant documentation.

> > One that has been dogging me is the correct way to split up 
> > a dataset across multiple files.  GnuCash startup time takes
> > a long time if you have a large data fileset.  So I would like
> > to split it up, while still allowing queries over older data 
> > in older files if the user needed that. 
> > 
> you mean like a file per table?

No.  We don't use the concept of tables in gnucash; we use the 
concept of "objects" which are similar but not the same.  
The objects are organized into 'books'. Multiple books may
be stored in one file.  Each 'book' is a self-contained, 
complete, self-consistent set of data.  So a "book" is a
"dataset". 

When splitting up datasets into several pieces (for example, 
across multiple files, so that file load can be faster), there are
parts that need to be copied, and parts that should not be copied. 

For example: the list of currency types (stocks/bonds are a currency
type in gnucash) need to occur in each copy of the split dataset, 
whereas transactions (purchase/sale of currency) do not.  Say the 
user has two years worth of data, and wants to have two files
each containing one years worth of data.  Each file must have
(identical) copies of the accounts & currency types, but must
have different transactions (those that occured in 2002 and 2003,
for example).  Any given transaction should occur in one file,
or the other, but not both.

(Note that the "date sort" is a bad example, since the 2004 file
might need to contain transactions from 1999 to close out lots.
A 'lot' is a collection of named transactions pertaining to a 
particular asset.  So the actual sorting of which transactions 
go into which file can be algorithmically complex).

> > Derek has been arguing that we supprt SQL only, so that we can
> > completely avoid/ignore this issue.  I think I've discovered 
> > a simple/easy answer, but haven't implemented it yet.
> > 
> which answer?

Answer is to not issue unique UUID's to each copy, and treat 
it as a data backup instead.  We would need to add a datestamp 
to each object so that we could identify the most recent copy.
There are some subtle copy-on-write semantics which I may be
forgetting though.  We use copy-on-write to avoid having
multpile copies of what would otherwise be the same thing.

--linas

-- 
pub  1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas linas org>
PGP Key fingerprint = 8305 2521 6000 0B5E 8984  3F54 64A9 9A82 0104 5933



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]