[Fwd: Re: [evolution-patches] Initial download of a folder]



-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
work: vanhoof at x-tend dot be
blog: http://pvanhoof.be/blog
--- Begin Message ---
On Mon, 2006-10-23 at 11:09 +0200, Philip Van Hoof wrote:
> Hi there,
> 
> This patch is useless without the mmap one. Nevertheless it might be
> interesting, for Evolution, as an experiment. It's not yet finished and
> support for it in other providers is needed (which I will work on).
> 
> Sidenote for, for example, Brutus and other 3th party developers who
> would like to get support for their service in tinymail (in case of
> Brutus, I also want support for Exchange, so chances are high that I
> will integrate it at some point anyway).

Highly appreciated. I'm looking forward to see what you will do with the
e-b code :-)


> ps. I'll ask again: if any Evolution developer feels that this type of
> messages shouldn't not be posted here: tell me. If nobody does, it's my
> opinion that it's related to Evolution and that maybe Evolution could
> benefit from experiments like this one.

These posts do definitely belong on evo-hackers, IMHO.

Best regards,
  jules


_______________________________________________
Evolution-patches mailing list
Evolution-patches gnome org
http://mail.gnome.org/mailman/listinfo/evolution-patches

--- End Message ---
--- Begin Message ---
On Mon, 2006-10-23 at 11:09 +0200, Philip Van Hoof wrote:

> Sidenote for, for example, Brutus and other 3th party developers who
> would like to get support for their service in tinymail (in case of
> Brutus, I also want support for Exchange, so chances are high that I
> will integrate it at some point anyway).
> 
> Each nth (in my experiment that's each 1000th) header, the method that

Because I noticed that 1:1001 is slower than 1:1 on some IMAP servers,
this version of the patch determines the next n using the serie 1, 2, 4,
8, ... with a limit of 1000 (then it stays 1000).

Maybe I will use a 1, 3, 7, 15, 31 or an equivalent as a test. I don't
think it's going to make a drastic difference in speed.

For small folders, it seems to do make a difference in speed (1:1 in
stead of 1:1001).

Of course I'm on a *! *#*&# Courier IMAPd, but that one should also work
well of course.


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
work: vanhoof at x-tend dot be
blog: http://pvanhoof.be/blog
Index: camel/providers/imap/camel-imap-folder.c
===================================================================
--- camel/providers/imap/camel-imap-folder.c	(revision 1046)
+++ camel/providers/imap/camel-imap-folder.c	(working copy)
@@ -2348,7 +2348,9 @@
 	CamelStream *stream;
 	char *uid, *resp;
 	GData *data;
-	
+	gboolean more = TRUE;
+	unsigned int nextn = 1;
+
 	if (store->server_level >= IMAP_LEVEL_IMAP4REV1)
 		header_spec = "HEADER.FIELDS (" CAMEL_MESSAGE_INFO_HEADERS MAILING_LIST_HEADERS ")";
 	else
@@ -2360,7 +2362,12 @@
 	
 	if( g_getenv ("EVO_IMAP_FETCH_ALL_HEADERS") )
 		header_spec = "HEADER";
-	
+
+while (more)
+{
+	gint count = 0;
+	more = FALSE;
+
 	/* Figure out if any of the new messages are already cached (which
 	 * may be the case if we're re-syncing after disconnected operation).
 	 * If so, get their UIDs, FLAGS, and SIZEs. If not, get all that
@@ -2374,25 +2381,38 @@
 		camel_message_info_free(&mi->info);
 	} else
 		uidval = 0;
-	
+
 	size = (exists - seq) * (IMAP_PRETEND_SIZEOF_FLAGS + IMAP_PRETEND_SIZEOF_SIZE + IMAP_PRETEND_SIZEOF_HEADERS);
 	got = 0;
 	if (!camel_imap_command_start (store, folder, ex,
-				       "UID FETCH %d:* (FLAGS RFC822.SIZE INTERNALDATE BODY.PEEK[%s])",
-				       uidval + 1, header_spec))
+				       "UID FETCH %d:%d (FLAGS RFC822.SIZE INTERNALDATE BODY.PEEK[%s])",
+				       uidval + 1, uidval + 1 + nextn, header_spec))
 		return;
+
+	/* 1, 2, 4, 8, .. 1000 etc etc */
+
+	if (nextn < 1000)
+		nextn += nextn;
+	else
+		nextn = 1000;
+
 	camel_operation_start (NULL, _("Fetching summary information for new messages in %s"), folder->name);
 	
 	/* Parse the responses. We can't add a message to the summary
 	 * until we've gotten its headers, and there's no guarantee
 	 * the server will send the responses in a useful order...
 	 */
+
 	fetch_data = g_ptr_array_new ();
 	messages = g_ptr_array_new ();
+
 	while ((type = camel_imap_command_response (store, &resp, ex)) ==
-	       CAMEL_IMAP_RESPONSE_UNTAGGED) {
+	       CAMEL_IMAP_RESPONSE_UNTAGGED) 
+	{
+		more = TRUE;
 		data = parse_fetch_response (imap_folder, resp);
 		g_free (resp);
+
 		if (!data)
 			continue;
 		
@@ -2401,7 +2421,7 @@
 			g_datalist_clear (&data);
 			continue;
 		}
-		
+
 		if (g_datalist_get_data (&data, "FLAGS"))
 			got += IMAP_PRETEND_SIZEOF_FLAGS;
 		if (g_datalist_get_data (&data, "RFC822.SIZE"))
@@ -2416,43 +2436,46 @@
 			add_message_from_data (folder, messages, first, data);
 			g_datalist_set_data (&data, "BODY_PART_STREAM", NULL);
 		}
-		
-		camel_operation_progress (NULL, got * 100 / size);
+
+		if (size > 0)
+			camel_operation_progress (NULL, got * 100 / size);
 		g_ptr_array_add (fetch_data, data);
 	}
 	camel_operation_end (NULL);
-	
+
 	if (type == CAMEL_IMAP_RESPONSE_ERROR)
 		goto lose;
-	
+
 	/* Free the final tagged response */
 	g_free (resp);
-	
+
 	/* Figure out which headers we still need to fetch. */
 	needheaders = g_ptr_array_new ();
 	size = got = 0;
-	for (i = 0; i < fetch_data->len; i++) {
+
+	for (i = 0; i < fetch_data->len; i++) 
+	{
 		data = fetch_data->pdata[i];
 		if (g_datalist_get_data (&data, "BODY_PART_LEN"))
 			continue;
-		
+
 		uid = g_datalist_get_data (&data, "UID");
 		if (uid) {
 			g_ptr_array_add (needheaders, uid);
 			size += IMAP_PRETEND_SIZEOF_HEADERS;
 		}
 	}
-	
+
 	/* And fetch them */
-	if (needheaders->len) {
+	if (needheaders->len) 
+	{
 		char *uidset;
 		int uid = 0;
-		
+
 		qsort (needheaders->pdata, needheaders->len,
 		       sizeof (void *), uid_compar);
-		
+
 		camel_operation_start (NULL, _("Fetching summary information for new messages in %s"), folder->name);
-		
 		while (uid < needheaders->len) {
 			uidset = imap_uid_array_to_set (folder->summary, needheaders, uid, UID_SET_LIMIT, &uid);
 			if (!camel_imap_command_start (store, folder, ex,
@@ -2464,7 +2487,7 @@
 				goto lose;
 			}
 			g_free (uidset);
-			
+
 			while ((type = camel_imap_command_response (store, &resp, ex))
 			       == CAMEL_IMAP_RESPONSE_UNTAGGED) {
 				data = parse_fetch_response (imap_folder, resp);
@@ -2476,7 +2499,8 @@
 				if (stream) {
 					add_message_from_data (folder, messages, first, data);
 					got += IMAP_PRETEND_SIZEOF_HEADERS;
-					camel_operation_progress (NULL, got * 100 / size);
+					if (size > 0)
+						camel_operation_progress (NULL, got * 100 / size);
 				}
 				g_datalist_clear (&data);
 			}
@@ -2491,7 +2515,7 @@
 		g_ptr_array_free (needheaders, TRUE);
 		camel_operation_end (NULL);
 	}
-	
+
 	/* Now finish up summary entries (fix UIDs, set flags and size) */
 	for (i = 0; i < fetch_data->len; i++) {
 		data = fetch_data->pdata[i];
@@ -2501,7 +2525,7 @@
 			g_datalist_clear (&data);
 			continue;
 		}
-		
+
 		mi = messages->pdata[seq - first];
 		if (mi == NULL) {
 			CamelMessageInfo *pmi = NULL;
@@ -2535,9 +2559,10 @@
 		
 		uid = g_datalist_get_data (&data, "UID");
 
-		/* This strdup() might not get freed since the mmap() patch! */
-		if (uid)
+		if (uid) {
 			mi->info.uid = g_strdup (uid);
+			mi->info.uid_needs_free = TRUE;
+		}
 
 		flags = GPOINTER_TO_INT (g_datalist_get_data (&data, "FLAGS"));
 		if (flags) {
@@ -2548,26 +2573,29 @@
 			mi->info.flags |= flags;
 			flags_to_label(folder, mi);
 		}
-		
+
 #ifdef NON_TINYMAIL_FEATURES
 		size = GPOINTER_TO_INT (g_datalist_get_data (&data, "RFC822.SIZE"));
 		if (size)
 			mi->info.size = size;
 #endif
+
 		g_datalist_clear (&data);
 	}
 	g_ptr_array_free (fetch_data, TRUE);
-	
+
 	/* And add the entries to the summary, etc. */
 	for (i = 0; i < messages->len; i++) {
 		mi = messages->pdata[i];
+
 		if (!mi) {
-			g_warning ("No information for message %d", i + first);
+			/* g_warning ("No information for message %d", i + first);
 			camel_exception_setv (ex, CAMEL_EXCEPTION_SYSTEM,
 					      _("Incomplete server response: no information provided for message %d"),
-					      i + first);
-			break;
+					      i + first); */
+			continue;
 		}
+
 		uid = (char *)camel_message_info_uid(mi);
 		if (uid[0] == 0) {
 			g_warning("Server provided no uid: message %d", i + first);
@@ -2599,15 +2627,17 @@
 			camel_folder_change_info_recent_uid(changes, camel_message_info_uid (mi));
 	}
 
+	camel_folder_summary_dump_mmap (folder->summary);
+
 	for ( ; i < messages->len; i++) {
 		if ((mi = messages->pdata[i]))
 			camel_message_info_free(&mi->info);
 	}
-	
 	g_ptr_array_free (messages, TRUE);
-	
-	return;
-	
+
+
+	goto endbmore;
+
  lose:
 	if (fetch_data) {
 		for (i = 0; i < fetch_data->len; i++) {
@@ -2623,6 +2653,12 @@
 		}
 		g_ptr_array_free (messages, TRUE);
 	}
+
+ endbmore:
+ count = 0;
+
+ } /* more */
+
 }
 
 /* Called with the store's connect_lock locked */
Index: camel/providers/Makefile.am
===================================================================
--- camel/providers/Makefile.am	(revision 1046)
+++ camel/providers/Makefile.am	(working copy)
@@ -1,18 +1,20 @@
 ## Process this file with automake to produce Makefile.in
 
+
+SUBDIRS = pop3 smtp imap local  
+
 if ENABLE_NNTP
-NNTP_DIR=nntp
+SUBDIRS += nntp 
 endif
 
 if ENABLE_IMAPP
-IMAPP_DIR=imapp
+SUBDIRS += imapp 
 endif
 
+
 if OS_WIN32
 else
-SENDMAIL_DIR=sendmail
+SUBDIRS += sendmail 
 endif
 
-SUBDIRS = pop3 $(SENDMAIL_DIR) smtp imap $(NNTP_DIR) local $(IMAPP_DIR) 
 
-
Index: camel/camel-folder.c
===================================================================
--- camel/camel-folder.c	(revision 1046)
+++ camel/camel-folder.c	(working copy)
@@ -2116,7 +2116,7 @@
 	struct _CamelFolderChangeInfoPrivate *p;
 	GPtrArray *olduids;
 	char *olduid;
-	
+
 	g_assert(info != NULL);
 	
 	p = info->priv;
Index: camel/camel-folder-summary.c
===================================================================
--- camel/camel-folder-summary.c	(revision 1046)
+++ camel/camel-folder-summary.c	(working copy)
@@ -113,6 +113,10 @@
 static void camel_folder_summary_init       (CamelFolderSummary *obj);
 static void camel_folder_summary_finalize   (CamelObject *obj);
 
+
+static void camel_folder_summary_mmap_add(CamelFolderSummary *s, CamelMessageInfo *info);
+static void camel_folder_summary_unload_mmap (CamelFolderSummary *s);
+
 static CamelObjectClass *camel_folder_summary_parent;
 
 static void
@@ -161,6 +165,26 @@
 	g_slice_free1 ((gint)user_data, data);
 }
 
+static gboolean always_true (gpointer key, gpointer value, gpointer gp)
+{
+	return TRUE;
+}
+
+static void 
+camel_folder_summary_unload_mmap (CamelFolderSummary *s)
+{
+	camel_folder_summary_clear(s);
+	if (s->file)
+		g_mapped_file_free (s->file);
+	s->file = NULL;
+
+	g_ptr_array_foreach (s->messages, foreach_msginfo, (gpointer)s->message_info_size);
+	if (s->messages->len > 0)
+		g_ptr_array_remove_range (s->messages, 0, s->messages->len-1);
+	g_hash_table_foreach_remove (s->messages_uid, always_true, NULL);
+
+}
+
 static void
 camel_folder_summary_finalize (CamelObject *obj)
 {
@@ -177,6 +201,7 @@
 	g_ptr_array_foreach (s->messages, foreach_msginfo, (gpointer)s->message_info_size);
 	g_ptr_array_free(s->messages, TRUE);
 	g_hash_table_destroy(s->messages_uid);
+	s->messages_uid = NULL;
 
 	g_hash_table_foreach(p->filter_charset, free_o_name, 0);
 	g_hash_table_destroy(p->filter_charset);
@@ -591,7 +616,7 @@
 			}
 		}
 
-		camel_folder_summary_add(s, mi);
+		camel_folder_summary_mmap_add(s, mi);
 	}
 
 
@@ -807,6 +832,17 @@
 }
 
 
+
+void 
+camel_folder_summary_dump_mmap (CamelFolderSummary *s)
+{
+	camel_folder_summary_save (s);
+	camel_folder_summary_unload_mmap (s);
+	camel_folder_summary_load(s);
+
+	return;
+}
+
 /**
  * camel_folder_summary_add:
  * @summary: a #CamelFolderSummary object
@@ -845,7 +881,25 @@
 	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
 }
 
+static void
+camel_folder_summary_mmap_add(CamelFolderSummary *s, CamelMessageInfo *info)
+{
+	CAMEL_SUMMARY_LOCK(s, summary_lock);
 
+/* unnecessary for pooled vectors */
+#ifdef DOESTRV
+	/* this is vitally important, and also if this is ever modified, then
+	   the hash table needs to be resynced */
+	info->strings = e_strv_pack(info->strings);
+#endif
+
+	g_ptr_array_add(s->messages, info);
+	g_hash_table_insert(s->messages_uid, (char *)camel_message_info_uid(info), info);
+	s->flags |= CAMEL_SUMMARY_DIRTY;
+
+	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
+}
+
 /**
  * camel_folder_summary_add_from_header:
  * @summary: a #CamelFolderSummary object
@@ -1090,9 +1144,9 @@
 void
 camel_folder_summary_touch(CamelFolderSummary *s)
 {
-	CAMEL_SUMMARY_LOCK(s, summary_lock);
+	//CAMEL_SUMMARY_LOCK(s, summary_lock);
 	s->flags |= CAMEL_SUMMARY_DIRTY;
-	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
+	//CAMEL_SUMMARY_UNLOCK(s, summary_lock);
 }
 
 
Index: camel/camel-folder-summary.h
===================================================================
--- camel/camel-folder-summary.h	(revision 1046)
+++ camel/camel-folder-summary.h	(working copy)
@@ -308,6 +308,8 @@
 CamelType			 camel_folder_summary_get_type	(void);
 CamelFolderSummary      *camel_folder_summary_new	(struct _CamelFolder *folder);
 
+void camel_folder_summary_dump_mmap (CamelFolderSummary *s);
+
 unsigned char*
 decode_uint32 (unsigned char *start, guint32 *dest, gboolean is_string);
 
_______________________________________________
Evolution-patches mailing list
Evolution-patches gnome org
http://mail.gnome.org/mailman/listinfo/evolution-patches

--- End Message ---
--- Begin Message ---
Hi there,

This patch is useless without the mmap one. Nevertheless it might be
interesting, for Evolution, as an experiment. It's not yet finished and
support for it in other providers is needed (which I will work on).

Sidenote for, for example, Brutus and other 3th party developers who
would like to get support for their service in tinymail (in case of
Brutus, I also want support for Exchange, so chances are high that I
will integrate it at some point anyway).

Each nth (in my experiment that's each 1000th) header, the method that
receives message-headers (and uses camel_folder_summary_add or an
equivalent) should issue a new API "camel_folder_summary_dump_mmap" if
it wants to support this hack.

The query also shouldn't ask "for all message headers" but rather "for
the next n message headers" and put the entire thing in a "while there
are still headers to fetch, loop". I haven't looked at the Brutus's GPL
Camel provider, so at this moment I don't know what it will do or if it
looks at all like the default imap one of Evolution.

The memory consumption of the default camel-imap-folder.c, 35M for
getting 30,000 headers, is obviously unacceptable for tinymail. Because
this basically means that such large folders can't be initially
downloaded to a typical mobile device (not even in steps, as a
cancellation means that no summary info is written in the default or
current implementation). Most such devices simply can't handle 35M. Not
even if you stand on your head and start dancing for it.

The attached patch makes it possible to both in steps and in one go do
an initial download of a large IMAP folder on a device with few memory
resources.


More information here

http://pvanhoof.be/blog/index.php/2006/10/22/initial-download-of-a-large-imap-folder-on-tinymail

and here http://tinymail.org/trac/tinymail/wiki/MemoryStats


ps. I'll ask again: if any Evolution developer feels that this type of
messages shouldn't not be posted here: tell me. If nobody does, it's my
opinion that it's related to Evolution and that maybe Evolution could
benefit from experiments like this one.

-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
work: vanhoof at x-tend dot be
blog: http://pvanhoof.be/blog
Index: camel/providers/imap/camel-imap-folder.c
===================================================================
--- camel/providers/imap/camel-imap-folder.c	(revision 1046)
+++ camel/providers/imap/camel-imap-folder.c	(working copy)
@@ -2348,7 +2348,8 @@
 	CamelStream *stream;
 	char *uid, *resp;
 	GData *data;
-	
+	gboolean more = TRUE;
+
 	if (store->server_level >= IMAP_LEVEL_IMAP4REV1)
 		header_spec = "HEADER.FIELDS (" CAMEL_MESSAGE_INFO_HEADERS MAILING_LIST_HEADERS ")";
 	else
@@ -2360,7 +2361,12 @@
 	
 	if( g_getenv ("EVO_IMAP_FETCH_ALL_HEADERS") )
 		header_spec = "HEADER";
-	
+
+while (more)
+{
+	gint count = 0;
+	more = FALSE;
+
 	/* Figure out if any of the new messages are already cached (which
 	 * may be the case if we're re-syncing after disconnected operation).
 	 * If so, get their UIDs, FLAGS, and SIZEs. If not, get all that
@@ -2374,25 +2380,34 @@
 		camel_message_info_free(&mi->info);
 	} else
 		uidval = 0;
-	
+
 	size = (exists - seq) * (IMAP_PRETEND_SIZEOF_FLAGS + IMAP_PRETEND_SIZEOF_SIZE + IMAP_PRETEND_SIZEOF_HEADERS);
 	got = 0;
 	if (!camel_imap_command_start (store, folder, ex,
-				       "UID FETCH %d:* (FLAGS RFC822.SIZE INTERNALDATE BODY.PEEK[%s])",
-				       uidval + 1, header_spec))
+				       "UID FETCH %d:%d (FLAGS RFC822.SIZE INTERNALDATE BODY.PEEK[%s])",
+				       uidval + 1, uidval + 1 + 1000, header_spec))
 		return;
+		
+
+	printf ("Getting more %d:%d\n", uidval + 1, uidval + 1 + 1000);
+
 	camel_operation_start (NULL, _("Fetching summary information for new messages in %s"), folder->name);
 	
 	/* Parse the responses. We can't add a message to the summary
 	 * until we've gotten its headers, and there's no guarantee
 	 * the server will send the responses in a useful order...
 	 */
+
 	fetch_data = g_ptr_array_new ();
 	messages = g_ptr_array_new ();
+
 	while ((type = camel_imap_command_response (store, &resp, ex)) ==
-	       CAMEL_IMAP_RESPONSE_UNTAGGED) {
+	       CAMEL_IMAP_RESPONSE_UNTAGGED) 
+	{
+		more = TRUE;
 		data = parse_fetch_response (imap_folder, resp);
 		g_free (resp);
+
 		if (!data)
 			continue;
 		
@@ -2401,7 +2416,7 @@
 			g_datalist_clear (&data);
 			continue;
 		}
-		
+
 		if (g_datalist_get_data (&data, "FLAGS"))
 			got += IMAP_PRETEND_SIZEOF_FLAGS;
 		if (g_datalist_get_data (&data, "RFC822.SIZE"))
@@ -2416,43 +2431,46 @@
 			add_message_from_data (folder, messages, first, data);
 			g_datalist_set_data (&data, "BODY_PART_STREAM", NULL);
 		}
-		
-		camel_operation_progress (NULL, got * 100 / size);
+
+		if (size > 0)
+			camel_operation_progress (NULL, got * 100 / size);
 		g_ptr_array_add (fetch_data, data);
 	}
 	camel_operation_end (NULL);
-	
+
 	if (type == CAMEL_IMAP_RESPONSE_ERROR)
 		goto lose;
-	
+
 	/* Free the final tagged response */
 	g_free (resp);
-	
+
 	/* Figure out which headers we still need to fetch. */
 	needheaders = g_ptr_array_new ();
 	size = got = 0;
-	for (i = 0; i < fetch_data->len; i++) {
+
+	for (i = 0; i < fetch_data->len; i++) 
+	{
 		data = fetch_data->pdata[i];
 		if (g_datalist_get_data (&data, "BODY_PART_LEN"))
 			continue;
-		
+
 		uid = g_datalist_get_data (&data, "UID");
 		if (uid) {
 			g_ptr_array_add (needheaders, uid);
 			size += IMAP_PRETEND_SIZEOF_HEADERS;
 		}
 	}
-	
+
 	/* And fetch them */
-	if (needheaders->len) {
+	if (needheaders->len) 
+	{
 		char *uidset;
 		int uid = 0;
-		
+
 		qsort (needheaders->pdata, needheaders->len,
 		       sizeof (void *), uid_compar);
-		
+
 		camel_operation_start (NULL, _("Fetching summary information for new messages in %s"), folder->name);
-		
 		while (uid < needheaders->len) {
 			uidset = imap_uid_array_to_set (folder->summary, needheaders, uid, UID_SET_LIMIT, &uid);
 			if (!camel_imap_command_start (store, folder, ex,
@@ -2464,7 +2482,7 @@
 				goto lose;
 			}
 			g_free (uidset);
-			
+
 			while ((type = camel_imap_command_response (store, &resp, ex))
 			       == CAMEL_IMAP_RESPONSE_UNTAGGED) {
 				data = parse_fetch_response (imap_folder, resp);
@@ -2476,7 +2494,8 @@
 				if (stream) {
 					add_message_from_data (folder, messages, first, data);
 					got += IMAP_PRETEND_SIZEOF_HEADERS;
-					camel_operation_progress (NULL, got * 100 / size);
+					if (size > 0)
+						camel_operation_progress (NULL, got * 100 / size);
 				}
 				g_datalist_clear (&data);
 			}
@@ -2491,7 +2510,7 @@
 		g_ptr_array_free (needheaders, TRUE);
 		camel_operation_end (NULL);
 	}
-	
+
 	/* Now finish up summary entries (fix UIDs, set flags and size) */
 	for (i = 0; i < fetch_data->len; i++) {
 		data = fetch_data->pdata[i];
@@ -2501,7 +2520,7 @@
 			g_datalist_clear (&data);
 			continue;
 		}
-		
+
 		mi = messages->pdata[seq - first];
 		if (mi == NULL) {
 			CamelMessageInfo *pmi = NULL;
@@ -2535,9 +2554,10 @@
 		
 		uid = g_datalist_get_data (&data, "UID");
 
-		/* This strdup() might not get freed since the mmap() patch! */
-		if (uid)
+		if (uid) {
 			mi->info.uid = g_strdup (uid);
+			mi->info.uid_needs_free = TRUE;
+		}
 
 		flags = GPOINTER_TO_INT (g_datalist_get_data (&data, "FLAGS"));
 		if (flags) {
@@ -2548,26 +2568,29 @@
 			mi->info.flags |= flags;
 			flags_to_label(folder, mi);
 		}
-		
+
 #ifdef NON_TINYMAIL_FEATURES
 		size = GPOINTER_TO_INT (g_datalist_get_data (&data, "RFC822.SIZE"));
 		if (size)
 			mi->info.size = size;
 #endif
+
 		g_datalist_clear (&data);
 	}
 	g_ptr_array_free (fetch_data, TRUE);
-	
+
 	/* And add the entries to the summary, etc. */
 	for (i = 0; i < messages->len; i++) {
 		mi = messages->pdata[i];
+
 		if (!mi) {
-			g_warning ("No information for message %d", i + first);
+			/* g_warning ("No information for message %d", i + first);
 			camel_exception_setv (ex, CAMEL_EXCEPTION_SYSTEM,
 					      _("Incomplete server response: no information provided for message %d"),
-					      i + first);
-			break;
+					      i + first); */
+			continue;
 		}
+
 		uid = (char *)camel_message_info_uid(mi);
 		if (uid[0] == 0) {
 			g_warning("Server provided no uid: message %d", i + first);
@@ -2599,15 +2622,17 @@
 			camel_folder_change_info_recent_uid(changes, camel_message_info_uid (mi));
 	}
 
+	camel_folder_summary_dump_mmap (folder->summary);
+
 	for ( ; i < messages->len; i++) {
 		if ((mi = messages->pdata[i]))
 			camel_message_info_free(&mi->info);
 	}
-	
 	g_ptr_array_free (messages, TRUE);
-	
-	return;
-	
+
+
+	goto endbmore;
+
  lose:
 	if (fetch_data) {
 		for (i = 0; i < fetch_data->len; i++) {
@@ -2623,6 +2648,12 @@
 		}
 		g_ptr_array_free (messages, TRUE);
 	}
+
+ endbmore:
+ count = 0;
+
+ } /* more */
+
 }
 
 /* Called with the store's connect_lock locked */
Index: camel/providers/Makefile.am
===================================================================
--- camel/providers/Makefile.am	(revision 1046)
+++ camel/providers/Makefile.am	(working copy)
@@ -1,18 +1,20 @@
 ## Process this file with automake to produce Makefile.in
 
+
+SUBDIRS = pop3 smtp imap local  
+
 if ENABLE_NNTP
-NNTP_DIR=nntp
+SUBDIRS += nntp 
 endif
 
 if ENABLE_IMAPP
-IMAPP_DIR=imapp
+SUBDIRS += imapp 
 endif
 
+
 if OS_WIN32
 else
-SENDMAIL_DIR=sendmail
+SUBDIRS += sendmail 
 endif
 
-SUBDIRS = pop3 $(SENDMAIL_DIR) smtp imap $(NNTP_DIR) local $(IMAPP_DIR) 
 
-
Index: camel/camel-folder.c
===================================================================
--- camel/camel-folder.c	(revision 1046)
+++ camel/camel-folder.c	(working copy)
@@ -2116,7 +2116,7 @@
 	struct _CamelFolderChangeInfoPrivate *p;
 	GPtrArray *olduids;
 	char *olduid;
-	
+
 	g_assert(info != NULL);
 	
 	p = info->priv;
Index: camel/camel-folder-summary.c
===================================================================
--- camel/camel-folder-summary.c	(revision 1046)
+++ camel/camel-folder-summary.c	(working copy)
@@ -113,6 +113,10 @@
 static void camel_folder_summary_init       (CamelFolderSummary *obj);
 static void camel_folder_summary_finalize   (CamelObject *obj);
 
+
+static void camel_folder_summary_mmap_add(CamelFolderSummary *s, CamelMessageInfo *info);
+static void camel_folder_summary_unload_mmap (CamelFolderSummary *s);
+
 static CamelObjectClass *camel_folder_summary_parent;
 
 static void
@@ -161,6 +165,26 @@
 	g_slice_free1 ((gint)user_data, data);
 }
 
+static gboolean always_true (gpointer key, gpointer value, gpointer gp)
+{
+	return TRUE;
+}
+
+static void 
+camel_folder_summary_unload_mmap (CamelFolderSummary *s)
+{
+	camel_folder_summary_clear(s);
+	if (s->file)
+		g_mapped_file_free (s->file);
+	s->file = NULL;
+
+	g_ptr_array_foreach (s->messages, foreach_msginfo, (gpointer)s->message_info_size);
+	if (s->messages->len > 0)
+		g_ptr_array_remove_range (s->messages, 0, s->messages->len-1);
+	g_hash_table_foreach_remove (s->messages_uid, always_true, NULL);
+
+}
+
 static void
 camel_folder_summary_finalize (CamelObject *obj)
 {
@@ -177,6 +201,7 @@
 	g_ptr_array_foreach (s->messages, foreach_msginfo, (gpointer)s->message_info_size);
 	g_ptr_array_free(s->messages, TRUE);
 	g_hash_table_destroy(s->messages_uid);
+	s->messages_uid = NULL;
 
 	g_hash_table_foreach(p->filter_charset, free_o_name, 0);
 	g_hash_table_destroy(p->filter_charset);
@@ -591,7 +616,7 @@
 			}
 		}
 
-		camel_folder_summary_add(s, mi);
+		camel_folder_summary_mmap_add(s, mi);
 	}
 
 
@@ -807,6 +832,17 @@
 }
 
 
+
+void 
+camel_folder_summary_dump_mmap (CamelFolderSummary *s)
+{
+	camel_folder_summary_save (s);
+	camel_folder_summary_unload_mmap (s);
+	camel_folder_summary_load(s);
+
+	return;
+}
+
 /**
  * camel_folder_summary_add:
  * @summary: a #CamelFolderSummary object
@@ -845,7 +881,25 @@
 	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
 }
 
+static void
+camel_folder_summary_mmap_add(CamelFolderSummary *s, CamelMessageInfo *info)
+{
+	CAMEL_SUMMARY_LOCK(s, summary_lock);
 
+/* unnecessary for pooled vectors */
+#ifdef DOESTRV
+	/* this is vitally important, and also if this is ever modified, then
+	   the hash table needs to be resynced */
+	info->strings = e_strv_pack(info->strings);
+#endif
+
+	g_ptr_array_add(s->messages, info);
+	g_hash_table_insert(s->messages_uid, (char *)camel_message_info_uid(info), info);
+	s->flags |= CAMEL_SUMMARY_DIRTY;
+
+	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
+}
+
 /**
  * camel_folder_summary_add_from_header:
  * @summary: a #CamelFolderSummary object
@@ -1090,9 +1144,9 @@
 void
 camel_folder_summary_touch(CamelFolderSummary *s)
 {
-	CAMEL_SUMMARY_LOCK(s, summary_lock);
+	//CAMEL_SUMMARY_LOCK(s, summary_lock);
 	s->flags |= CAMEL_SUMMARY_DIRTY;
-	CAMEL_SUMMARY_UNLOCK(s, summary_lock);
+	//CAMEL_SUMMARY_UNLOCK(s, summary_lock);
 }
 
 
Index: camel/camel-folder-summary.h
===================================================================
--- camel/camel-folder-summary.h	(revision 1046)
+++ camel/camel-folder-summary.h	(working copy)
@@ -308,6 +308,8 @@
 CamelType			 camel_folder_summary_get_type	(void);
 CamelFolderSummary      *camel_folder_summary_new	(struct _CamelFolder *folder);
 
+void camel_folder_summary_dump_mmap (CamelFolderSummary *s);
+
 unsigned char*
 decode_uint32 (unsigned char *start, guint32 *dest, gboolean is_string);
 
_______________________________________________
Evolution-patches mailing list
Evolution-patches gnome org
http://mail.gnome.org/mailman/listinfo/evolution-patches

--- End Message ---
--- Begin Message ---
On Sun, 2006-10-22 at 15:41 +0200, Philip Van Hoof wrote:
> Greetings,
> 
> imap_update_summary is implemented in three or four steps:
> 
>   o. Getting (all) the headers/uids
>   o. Finding the ones that we must still fetch
>   o. Fetching those (x)
>   o. Writing out the summary
> 
> The steps each consume memory and reuse some of the memory of the
> previous step. Pointers to that memory is stored in GPtrArray's like
> fetch_data and messages.
> 
> In the code I have found no real reason to why this was done in
> separated loops (steps) rather than one step and at the end of the loop,
> free the data already. Especially for the third step (x), which seem to
> consume most memory while it's happening.

I rewrote this behavior in camel-GroupWise to fetch data in iterations,
so that the memory requirement remains a constant k instead of O(n), n
being the number of messages. I expected it work better. (GW and IMAP
code are similar in this aspect)

However, when I tested it, as expected, the memory requirement came down
but the number of disk-access increased and hence it became slow. So I
reverted it to the old behavior. 

You can try rewriting this in IMAP and you will find out that the time
taken to complete the sync will increase in case of large folders.


> 
> The current implementation requires the data being received from the
> IMAP service to be kept in memory until the entire folder has been
> received and all steps done. This consumes more than one entire kilobyte
> per message. Multiply that with for example 5,000 headers and you'll get
> 5 MB memory consumption for fetching the new messages of a very small
> IMAP folder (in case no other messages had been received before you
> first started the procedure).
> 
> Multiply that with 50,000 headers and you'll get 50 - 60 MB memory
> consumption for a not extremely big but relatively big IMAP folders.
> 
> Which will be freed, yes, but nevertheless it's a slowly growing peak
> (the speed depends on the connection with your IMAP server) that only
> gets deallocated or when pressing cancel or when all messages are
> received (which can take a significant amount of time).

I tested the changes that I made (in camel-GW) and found that in actual
deployment scenario, people prefer having an occasional memory-shootup
(which will come down as soon as mail-fetch is complete) rather than
having so many disk-access, that will eventually make operations longer
to complete.


> 
> The strange part is that if I measure the amount of bytes that I receive
> from the IMAP service; I measure far less bytes being transferred than
> bytes being consumed in memory. It not only stores all the received
> data, it also stores a lot more in memory (probably mostly 4 bytes
> pointers and GData stuff).
> 
> I wonder whether there was a reason why it was implemented this way? If
> not, I'm planning to rewrite imap_update_summary in a different way. For
> example by immediately creating a CamelMessageInfo struct or burning it
> to the summary file instantly and freeing the GData created from the
> stream in-loop.

If you want the memory usage to be a constant value, the best solution
is to make the folder_sync function fetch summaries iteratively from a
database back-end (not from flat-files or mmap). Perhaps this should be
done at a higher (camel) level so that all the providers can make use of
it; Means rewriting parts of the camel folder, summary etc.

Anyway, all this is already covered under "On-Disk-Summaries". If you
are so obsessed about memory usage, go ahead and give it a try. 

Some times, the customer's needs are different from what the hacker may
perceive to be the most important thing. Evolution's periodic memory
shootup (and subsequent coming down) is considered to be normal by the
customers than things like Proxy-authentication-failure (libsoup), EDS
crashes etc, that we have been working on.

It is an interesting work but we (the Evolution team) have got other
priorities driven based on actual customer deployment needs. These are
the most important things that Evolution (and indirectly Camel also)
should address to become a stable enterprise-level GNOME app. 

Sankar


_______________________________________________
Evolution-hackers mailing list
Evolution-hackers gnome org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

--- End Message ---
--- Begin Message ---
Greetings,

imap_update_summary is implemented in three or four steps:

  o. Getting (all) the headers/uids
  o. Finding the ones that we must still fetch
  o. Fetching those (x)
  o. Writing out the summary

The steps each consume memory and reuse some of the memory of the
previous step. Pointers to that memory is stored in GPtrArray's like
fetch_data and messages.

In the code I have found no real reason to why this was done in
separated loops (steps) rather than one step and at the end of the loop,
free the data already. Especially for the third step (x), which seem to
consume most memory while it's happening.

The current implementation requires the data being received from the
IMAP service to be kept in memory until the entire folder has been
received and all steps done. This consumes more than one entire kilobyte
per message. Multiply that with for example 5,000 headers and you'll get
5 MB memory consumption for fetching the new messages of a very small
IMAP folder (in case no other messages had been received before you
first started the procedure).

Multiply that with 50,000 headers and you'll get 50 - 60 MB memory
consumption for a not extremely big but relatively big IMAP folders.

Which will be freed, yes, but nevertheless it's a slowly growing peak
(the speed depends on the connection with your IMAP server) that only
gets deallocated or when pressing cancel or when all messages are
received (which can take a significant amount of time).


The strange part is that if I measure the amount of bytes that I receive
from the IMAP service; I measure far less bytes being transferred than
bytes being consumed in memory. It not only stores all the received
data, it also stores a lot more in memory (probably mostly 4 bytes
pointers and GData stuff).

I wonder whether there was a reason why it was implemented this way? If
not, I'm planning to rewrite imap_update_summary in a different way. For
example by immediately creating a CamelMessageInfo struct or burning it
to the summary file instantly and freeing the GData created from the
stream in-loop.



-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

_______________________________________________
Evolution-hackers mailing list
Evolution-hackers gnome org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

--- End Message ---
--- Begin Message ---
On Sun, 2006-10-22 at 15:41 +0200, Philip Van Hoof wrote:

> In the code I have found no real reason to why this was done in
> separated loops (steps) rather than one step and at the end of the loop,
> free the data already. Especially for the third step (x), which seem to
> consume most memory while it's happening.

After measuring the memory usage of the implementation, it saw that it's
not the third step but the first that is allocating 90% of the memory.
That's this loop:

	fetch_data = g_ptr_array_new ();
	messages = g_ptr_array_new ();
	while ((type = camel_imap_command_response (store, &resp, ex) ==
	       CAMEL_IMAP_RESPONSE_UNTAGGED) {
		data = parse_fetch_response (imap_folder, resp);
		g_free (resp);
		...
	}

I measured this by simply interrupting the function (returning it), and
running it in valgrind. The first one at the very start allocates
everything (while receiving the result).


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

_______________________________________________
Evolution-hackers mailing list
Evolution-hackers gnome org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

--- End Message ---
--- Begin Message ---
On Mon, 2006-10-23 at 04:02 -0600, Sankar P wrote:

Hey Sankar,

Thanks for the response.

> I rewrote this behavior in camel-GroupWise to fetch data in iterations,
> so that the memory requirement remains a constant k instead of O(n), n
> being the number of messages. I expected it work better. (GW and IMAP
> code are similar in this aspect)
> 
> However, when I tested it, as expected, the memory requirement came down
> but the number of disk-access increased and hence it became slow. So I
> reverted it to the old behavior. 

I acknowledge that the time increases. For me not significant but it
does indeed take longer. Doing nothing is of course faster than dumping
something to disk.

> You can try rewriting this in IMAP and you will find out that the time
> taken to complete the sync will increase in case of large folders.

This is correct and the case. On a mobile device, however, the time it
takes is (imo) less important than the possibility to do it. Maybe I
shouldn't use the "1, 2, 4, 8, etc" serie but do something like "1, 10,
100, 1000" to decrease the amount of to-disk dumps? What do you think?

At this moment it's the provider-code itself that determines this nth.

When talking about 'design', it might indeed be better if the camel-
folder-summary.c implements that decision. For now, the experiment
itself is implemented as a hack (I agree).

I would be interested in trying to improve it in a design-sense if
there's interest (and discussion) from the Evolution team in something
like this.

Being a hack, once supported by all providers, I am probably going to
use it in tinymail. I would of course prefer that Camel itself would
someday enjoy the same improvements too.

But it's not something I require nor will push for. I "touch evolution",
if you want it..tell me :)

> I tested the changes that I made (in camel-GW) and found that in actual
> deployment scenario, people prefer having an occasional memory-shootup
> (which will come down as soon as mail-fetch is complete) rather than
> having so many disk-access, that will eventually make operations longer
> to complete.

For a desktop user I can imagine that, yes.

For a mobile device, the memory shootup means that it's impossible to
support such large folders and that during the download, the device
becomes unusable for almost-large folders (that I definitely do want to
support) sized around 20,000 hdrs.

This is perhaps why a different design would be a good idea?: delegate
the decision to a more specific plugin or piece of code that is global
for all providers (rather than a patch-per-provider for each target).
 
> If you want the memory usage to be a constant value, the best solution
> is to make the folder_sync function fetch summaries iteratively from a
> database back-end (not from flat-files or mmap). Perhaps this should be
> done at a higher (camel) level so that all the providers can make use of
> it; Means rewriting parts of the camel folder, summary etc.

I agree

> Anyway, all this is already covered under "On-Disk-Summaries". If you
> are so obsessed about memory usage, go ahead and give it a try. 

I should. But I need something that works today ;).

I have already, however, repeatedly stated that I'm very interested in
the on-disk-summary concepts and that if a team would start with this
work, I'd definitely join that team.

I'm not naive to think that I could do it on my own. I ACK that
implementing the concepts takes a team of smart people. I think that
libspruce and GMime would be a nice (fresh) starting point for such an
implementation.

If it would happen, tinymail would probably become one of the first
E-mail framework(/client) to actively use the disk-summary concepts and
ideas.

Absolutely :)

> Some times, the customer's needs are different from what the hacker may
> perceive to be the most important thing. Evolution's periodic memory
> shootup (and subsequent coming down) is considered to be normal by the
> customers than things like Proxy-authentication-failure (libsoup), EDS
> crashes etc, that we have been working on.

> It is an interesting work but we (the Evolution team) have got other
> priorities driven based on actual customer deployment needs. These are
> the most important things that Evolution (and indirectly Camel also)
> should address to become a stable enterprise-level GNOME app. 

No problem. Yet I hope my experiments might someday be useful for the
Evolution project. Until then, I'll use tinymail as host for them.


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
work: vanhoof at x-tend dot be
blog: http://pvanhoof.be/blog

_______________________________________________
Evolution-hackers mailing list
Evolution-hackers gnome org
http://mail.gnome.org/mailman/listinfo/evolution-hackers

--- End Message ---


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]