Re: GtkTreeView with huge amount of rows; dynamic data source?



On Thu, 2005-01-20 at 10:37, Gus Koppel wrote:
Greg Breland wrote:

I think you mis-understood what I mean by the word "populate"  if your
database has 100K rows then you are going to create a liststore with
100K blank rows and then populate the first 200 rows with data from
the database.  This allows you to get around the tough problem of
managing your own scroll bars.

No, it doesn't. If I understand you correctly, you suggest to create
i.e. 100 K rows of which 99.8 K rows would never really be used, i.e.
never be filled with data? Only the first 200 rows would be? 

Thats correct, but the 200 rows was just a guess.  this number should be
fine tuned as much as possible within the limits of the TreeView
Control.

I think,
that's an unacceptable strategy as well. Just consider adding another
"0" to the hypothetical total number of rows, and the resulting memory
and preparation time requirements.

Preparation time would be the big hit...about 8 seconds on my box. 
Memory isn't that bad since there isn't any data, just another node on
the linked list.  Course, 10M rows would NOT be feasible with the system
I outlined, but would be with yours since your just windowing the data.

Populating only the first 200 rows with real data wouldn't solve the
problem with scrollbar management, either. If the scrollbar gets dragged
to a 50% offset, it's not the first 200 (50) rows of a List/TreeStore
that are displayed, but the rows starting at offset 50 K (in this
example). So I'd have to fill those 200 rows with data as well. And if
the user decides to scroll through the entire list, page by page, I
would end up in having nearly all records being read into memory as
well, consuming lots of RAM and slowing down the treeview a lot. That's
what I originally wanted to avoid by a dynamic solution, which doesn't
store more rows in RAM than are actually being displayed.

I agree that my method uses more RAM and would take longer to build the
initial liststore.

However, I disagree that it will be slower, quite the opposite.  Leaving
aside my "Read more rows than you need" optimization, every time the
user scrolls the worse case for both our methods is that we have to
populate the visible records.  However, your best case is the same as
your worst case.  

Assuming a grid displaying 50 records, 100K total. If the user starts at
the top then jumps to the middle and then jumps back to the top he has
pulled 150 records from the database.  In my model since any rows pulled
are cached, he only pulls 100 rows total.  Your trading memory usage for
speed.

Of course, I also advocate pulling more rows than needed to just display
the currently visible bit of the liststore so in the above simple
example I would probably pull more rows than you but since I would only
make two trips back to the server instead of your three mine should
still be faster.


I disagree here.  You should read more rows than you need because of
the latency involved when pulling information from the Database. 
Usually it takes 1% more time to pull 10 rows than to pull 1 row. 
What is going to happen when the user holds the scroll down arrow? 
Your app is going to send thousands of SELECTs to pull one row at a
time as fast as it can.

Well, that's just an optimization detail, which is to be addressed after
the main problem has been solved. 

Agreed

Your suggestion of readahead bases on
the assumption that users would mainly scroll down the list in a
sequential order. However, users could also decide to mainly scroll up
or just use the scrollbar to make rather random movements in the list.

Here is the bad news...dragging the scroll bar isn't random as the grid
actually scrolls as you drag which could really mess with both our
systems since it would result in too many requests back to the
database.  I don't think either of our models could handle showing data
as the users drag scrolls.  My model would be able to do it going back
up to the top which is better than nothing.

Readahead would result in a slight performance penalty then and, in case
the data source is non-local, significantly increased network traffic
(by 300%, if 50 rows were visible, 200 rows were read every time and the
user doesn't scroll down sequentially).

Again, the 200 was a guess at a number that would always work no matter
the size of the control.  I would think scrolling sequentially is the
most common case.  Given a long list of data I seek roughly to where I
think I need to be...maybe takes two or three tries...then I scroll
sequentially to the row I'm looking for.  

The larger the dataset, the more sequentially scrolling I do because the
granularity of the scrollbar seek becomes too rough.  Sure your method
pulls much less data on those random seek jumps, but if the users
scrolls sequentially at all, my method would win.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]