Sorting



Sorting is still a major problem in tinymail:

o. First of all, for getting sorting right you have to read all the
message headers at some point. The results of such a sort should
therefore at least be stored on disk, and recalculated when a new
message arrives.

o. Some people want to have support for message threading in their
summary view. I'm not planning to focus on such a feature, but would
welcome work that goes in that direction.

Some information can be found here:
http://www.jwz.org/doc/threading.html

It would of course be nice if we can store the sorting result and reuse
the stored information.

o Storing the information

I was thinking about a mmapable format for storing sorting results.

Something like this:

1) [1 byte depth as char] [ 1 byte UID LENGTH ] [UID as a string]

or

2) [1 byte depth as char] [ 4 bytes UID as unsigned integer ]

Let's illustrate point 1) first:

0x01 0x04 0x30 0x30 0x30 0x31 0x02 0x03 0x30 0x30 0x32 0x01 
0x04 0x30 0x30 0x31 0x32 0x01 0x05 0x30 0x30 0x30 0x30 0x34  
  
+-----------+-------------------------------------------------+------------------+
| 0000:0000 | 01 04 30 30 30 31 02 03 30 30 32 01 04 30 30 31 | ..0001..002..001 |
| 0000:0010 | 32 01 05 30 30 30 30 34                         | 2..00004         |
+-----------+-------------------------------------------------+------------------+

Where
uid=0001  Subject=First header
uid=002   Subject=Second header
uid=0012  Subject=Third header
uid=00004 Subject=Fourth header

For a tree like this:

-+-First header
--+-Second header
-+-Third header
-+-Fourth header

Such a binary file could be mmaped on platforms that support mmap (or
read as it is without needing any extra parsing, which is what is
important about this format) and used by the custom GtkTreeModel to
instantly how to show the instances.

Perhaps somebody else already made such a tree-index format? It would
certainly be very nice if tinymail would use something like this. So
it's at least on my todo list.

Important is that UID's have a variable size and that most are shorter
than four bytes. Storing 16.000 such uids with threading information
might be wasting a little bit memory if we'd for example always use a 32
bit field (an unsigned normal integer). But it might be a little bit
more faster?

So let's illustrate 2) :

0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x01

The same sample would look like this:

0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x01 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x02 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x0c 0x01 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x04

+-----------+-------------------------------------------------+
| 0000:0000 | 01 00 00 00 00 00 00 00 01 02 00 00 00 00 00 00 | 
| 0000:0010 | 00 02 01 00 00 00 00 00 00 00 0c 01 00 00 00 00 |
| 0000:0020 | 00 00 00 04                                     |
+-----------+-------------------------------------------------+

I might have made a mistake but you probably get the idea?


About 2), what I don't know is whether or not message UID's are ALWAYS
integer fields. Using 1) the UID's of the messages can also be character
data of lengths 1 to 255. This is the reason why I'm more pro the first
idea. If longer UID's are needed, the format can be adapted to have two
or more bytes for the uid-length information.

Anyway, 1) is significantly shorter in memory usage and supports more
possibilities (uid's not being integers).


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]