Re: finally caught an error on crash



On 2020.05.05 16:01, Albrecht Dreß wrote:
Am 04.05.20 23:01 schrieb(en) Jack via balsa-list:
This continues to happen at a low, but still annoying frequency. The line number in the error is now 247, having been 249 for a while, presumably just due to other edits to that file. As I can remember, it happens after either deleting message(s) from the inbox, and/or moving message(s) to different mailboxes, in case that points to any possibilities.

Hmmm, the error message (failed assertion msgno <= mbox->msgno_2_msg_info->len) “smells” like a corrupted GPtrArray; maybe caused by a race between two threads, when messages are moved and/or deleted. I'm not really familiar with the mailbox code (Peter? Pawel? Are you?), so this is a wild guess, though. The failing function seems to be message_info_from_msgno() which is called in several places; do you have a backtrace from a crash?
I agree where it happens, but no, I have not yet caught the crash when running under a debugger.

Given the recent changes (in the cleanup-logging branch) I'm wondering if I might just for my own use add some additional debug statements, either in the function where the g-assert fails, or else before the calls to that function.

As running in gdb is not always feasible, this may actually be helpful. Just add g_debug() statements…
I finally did this yesterday - in fact putting a call right before every call to message_info_from_msgno(). There are a LOT of those calls, sometimes running through all messages in INBOX in succession, and sometimes even the same call two or three times in a row. I can easily sent a sample log. But no crash yet since doing that.

Any quick pointers (or even just a pointer to an example line or two) would be helpful, and also advice on which debug domain to use (or even to set for current testing) or how to go about adding a new domain.

If you use the new branch, the log domain is set to (line 50) to

#define G_LOG_DOMAIN "mbox-mbox"

so running balsa by calling

G_MESSAGES_DEBUG=mbox-mbox /path/to/balsa

would print /only/ messages from this domain.
I actually realized that that domain seems to cover all the code in that file (and likewise for some of the others, so I didn't need to define the domain, just set the env var.

It might also be helpful (given that my assumption of a race condition is true) to add debug messages to libbalsa/mailbox.c, namely in libbalsa_lock_mailbox() and libbalsa_unlock_mailbox() (and probably more), like

g_debug("%s: mbox %s", __func__, libbalsa_mailbox_get_name(mailbox));

That file didn't have its own log domain – which is bad, I just pushed a tiny change to the branch setting it to

#define G_LOG_DOMAIN "mailbox"

The above call should now read

G_MESSAGES_DEBUG="mailbox mbox-mbox" /path/to/balsa
I just pulled your changes, and merged into my debugging branch. I'll try to add that later today.

Finding the source of a race (which, again, is just a guess) is always /very/ difficult, and adding debug messages may actually change the internal timing of the application, effectively hiding the bug…
Heisenbug, huh?

Hope this helps,
I think confirms my thinking, and pushes me along the right path.
Albrecht.
Jack

------quoted attachment------
_______________________________________________
balsa-list mailing list
balsa-list gnome org
https://mail.gnome.org/mailman/listinfo/balsa-list




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]