This latest version of the future summary store uses a hashtable to identify the duplicates. As expected makes it the writing of the summary a lot faster (hashtables make a hash of each added key-string, which makes searching-for-dups a lot faster then my former quickly made doubly linked list code for this). I'm still hoping to somehow avoid the g_strdup in the code that dumps (writes) the summaryblock. Making copies of strings is not really fast. I added the necessary locking. The reference counting of the summaryblocks is not 100% correct, yet. Although you should get the idea: o. Not individual summaryitems but its container summaryblock gets a reference added. o. Initial load of a summary, which is a container for summaryblocks, creates all summaryitems which will each add one reference to their container summaryblock o. Unload of a summary unloads each summaryblock, which merely means taking the one reference of each of its contained summaryitems. o. Requesting a summaryitem from the summary adds a reference to the container summaryblock of the item. o. Stopping usage of that summaryitem takes away the reference on the container summaryblock o. A summaryblock can live without its container summary *** o. A summaryitem can't live without its container summaryblock ***** o. A summaryblock contains on average 1000 items o. A summary contains (Item-Amount / 1000) summaryblocks I'll repeat the core ideas of this summary store implementation: o. Localisation of mapped memory: often needed strings are placed in the beginning of the mmap()ed file (they are sorted on usage-count by the summaryitems) -> Current summary storage scatters strings around, no matter how often they are used (they are simply duplicated each time in stead resulting in a larger VmSize and in more mapping/swapping operations that have to be done by the kernel) o. Duplicate strings are made unique in the mmap -> Current summary storage scatters duplicate strings around in the mapping. Making the mapping more random (less localised memory access), more redundant (larger VmSize). o. Blocks of 1000 items that can function individually (per block) o. Quick at appending new material: a new writable block is created and the current block is written (if 1000 is reached) -> Current summary storage needs to rewrite the entire summary.mmap file each 5000th item that gets added. This gets hideously slow starting at 15,000 items. Especially on mobile ARM devices. o. Quick at changing the flags of items: a separate file per block is utilised to store the item's flags. Only this sequential file needs to be rewritten when flags of the item(s) changed. -> This is equally slow once you are at 15,000 items. o. Fast searching: because of ideal localisation will often requested strings be mapped in real memory modules (by the kernel). o. Fast but low-memory browsing: currently visible items will be mapped in real memory modules (by the kernel), items that didn't get requested aren't mapped in. Not even touched at initial scan stage. -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
mytest5.tar.gz
Description: application/compressed-tar