Re: [gnome-db] patch : gda-postgres remove-row.



On Sat, 2005-01-15 at 17:57 +0100, Rodrigo Moya wrote:
> On Thu, 2005-01-06 at 21:44 +1000, Bas Driessen wrote:
> > Hello,
> > 
> > I am (re)visiting the record-set implementation again for the Postgres
> > provider, since I want to start using that now. Let's start with the
> > non-existing "remove-row" method.
> > 
> > Attached a patch for the "remove-row" method for data models for the
> > Postgres provider. It will update both the data model as the
> > underlying database. Some comments:
> > 
> > - Only works if there is a unique key available (similar like the
> > existing "update-row" method). This is probably correct, but I am not
> > sure if Postgres has a concept of a "row-id" that we can use. 
> > 
> > - Postgres provider uses the hash version of a Data Model. This means
> > that only information requested will be in the Data Model. If a row is
> > requested from the data model that has not been loaded yet, the
> > information will come from the Postgres cursor (PGresult). Because of
> > this concept, it is pretty much impossible to physically remove a row
> > from the data model, since the numbering of the data model will then
> > no longer be in sync with the Postgres cursor (data can not be removed
> > from the Postgres cursor unless triggering the SQL again which can be
> > time consuming). Therefore I came up with the concept of placing the
> > text "NULL" in all fields of the row deleted from the data model and
> > setting the row id (gda_row_set_id) to "R" (for Removed).  This way
> > deleted rows can easily be identified without destroying the current
> > structure.
> >
> yes, but accessing rows by number will be wrong, if I understood
> correctly. That is, you remove row 8, and then client requests the new
> row 8 (which was row 9). In the way the provider is done, it seems it's
> going to fail for some cases, unless you add a lot of code to detect the
> correct numbering.
> 

No, unfortunately you did not understand correctly :) The row number
never changes. So if row 8 is removed, then if the client requests row 8
again, the client should first test if the row_id is set to "R" (or
perhaps NULL, depending how we implement it) and in that case return an
error. That way the hash-table numbering stays in sync with the Postgres
cursor. Question is if it is the responsibility of the client to check
for a removed row, or that that check should be done within gda. (Please
read on first before replying as I will explain in more detail below)

> >  Perhaps we need a setting "visible" for rows in a data model? The
> > only thing that is debatable then is the number of rows in a data
> > model.
> >
> >  It should return with the number of rows minus deleted ones, but if
> > we do that then chances are not all rows are processed that should be
> > processed. Better to skip row by testing on the id (gda_row_get_id). A
> > 2nd type of row return perhaps? Suggestions?
> > 
> I am thinking that the provider should probably have rows in a dynamic
> array, and for rows it hasn't retrieved yet, have a flag that indicates
> the row needs to be retrieved. Thus, when you remove a row, you only
> need to remove it from the array and all rows will be renumbered
> correctly. Similarly, when getting the rowcount from PostgreSQL, the
> provider should just add empty records (with the 'needs-retrieveing'
> flag set to on) for all rows.
> 
I have been thinking along these lines as well, but what is then the
point of having a GdaDataModelHash implementation? We should only be
using GdaDataModelArray instead and get rid of GdaDataModelHash.

> Or we could change GdaDataModelArrayHash to be more clever about the row
> numbers. That would probably be much better, thus the postgres provider
> just needs to call a gda_data_model_array_hash_remove_row function.
> 

Agreed that the work should be done on this level. I don't mind making
time available to do that, but let's first make sure we agree on the
implementation. At the moment "remove_row" is not implemented yet for
any provider, so we should set an example.

Let's outline the various "levels" again.

The lowest level is the Postgres cursor (I keep calling it cursor, since
I am not sure the official terminology used by Postgres. Tuples set
perhaps?) Anyway, the cursor holds all records returned from a SQL
query. We can retrieve information from this cursor, but we can not
modify/delete any information here.

The next level is then the hash table. The contents of the hash table
represents the data model. Every entry number in the hash table matches
the entry number of the Postgres Cursor. Only rows retrieved will be put
physically into the hash table. So this pretty much takes care of the
"reserved entry" that Rodrigo mentioned in his comment. At the moment,
data can be modified in the hash table and rows can be appended. Both
those type of actions do not change any numbering. Appended rows just
get a new incremented number. The row number returns the entry number in
the hash table and the row id returns the entry number in the Postgres
cursor. This all is implemented and works fine now (for Postgres).

Knowing this, we look at the remove row method. If we physically want to
remove an entry from the hash table, a lot of code is needed to take
care of the renumbering to stay in sync with the Postgres cursor and a
lot of processing time is needed to renumber all entries in a hash table
(ie remove entry 109 and add it as entry 108 etc) So the wise solution
here is to replace the values of a deleted row with dummy/NULL values.
If we physically do remove an entry from the hash table, it will just
become a "reserved entry" again and the next time the client requests
it, the values of the entry will be retrieved from the Postgres cursor
and the row is restored again. 

If we agree in the fact that the best way is to put dummy/NULL values in
places for a removed row (or at least not to physically remove it), then
we can discuss how to best administer this.

The solution I came up with is to mark a row as being deleted to set the
row id (= not the row number) to "R" and add the text "NULL" to all the
fields. An alternative can be to add a level 3 layer being a dynamic
array as Rodrigo suggested. So instead of querying the hash table
directly, everything will go through that array. The array just contain
2 fields. The entry number and the hash-table entry number. In that case
the number of entries in the array is the number of rows in the data
model and we can easily process them sequential. We do not have to mark
removed rows, but we just remove the entry from the dynamic array and it
will not be processed again.

Example:

After a select query I have 5 rows. The array will then look like this:

0-0
1-1
2-2
3-3
4-4

Then I remove row number 2 and the array will look like this:

0-0
1-1
2-3
3-4

So if I request data model row 2, it will, behind the scenes, go out and
retrieve hash-table entry 3 for me. 

I can put all this logic in the various gda_data_model_hash_* functions,
so clients only have to work with calls to these functions and do not
need to know about the exact implementation of have to test on "R"
marked row ids etc.

Would appreciate any comments/ideas/thoughts on this subject. Let me
know which implementation you think is best, so I can start the work.

Thanks.

> > Also any tips/advise how to deliver patches (ie  which diff command
> > from where) to this list are appreciated.
> > 
> I use this: cvs -z3 diff -up
> 
> which includes the function names in the patch.

Thanks, will use that from now on. 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]