Re: Balsa can't download a mail with a long line, gets stuck in endless dup-creating loop.



On 10/31/2012 09:15:38 AM, Jack Ostroff wrote:
On 2012.10.30 19:54, Rob Landley wrote:
On 10/29/2012 05:11:47 PM, Jack wrote:
On 2012.10.29 17:46, Rob Landley wrote:
.....
...I haven't exited balsa to run my filters yet

You can run filters under menu "Mailbox/Select Filters"

If I could get Balsa to actually match a list-id tag, I would.

  http://landley.net/notes-2012.html#15-10-2012
I haven't read all of this yet, so I don't know what you've tried, or why it isn't working, but would there be any point in starting a new thread about this problem with filters?

If I can get it built from source, sure. If I can't, not much point.

Separately, although I can imagine you're already down a particular path to working through that huge inbox, is it worth considering a slightly different approach? What about writing a script to break that huge mbox into smaller chunks - with maybe up to a few thousand messages in each? Whether you process with Balsa or your scripts, it might be easier (and safer?) to work incrementally. Sorry if you've already thought of this and have a good reason for not.

I already dealt with the mbox backlog, the python script that sorts it into mboxes is attached if you're bored.

I'm happy to debug balsa's built-in filtering more, but I couldn't get it to work and have bypassed it for the moment.

I spent a couple hours trying to figure out how to phrase it and never got a single message moved out of the inbox (imap or local), so I gave up and wrote a python program.
I've sometimes had luck doing bulk filtering by using the filter dropdown at the top of the main window. For example, if I filter the inbox by "Subject or Sender contains:" on the address that sends a particular list, I may still have to "ctl-click" select specific messages, but it's easier than looking at the complete list. Once I have a bunch selected, I can right-click and "move-to" another mailbox.

I could never get a rule to trigger on a list-id: header in a message. (With or without colon, exact case matching, and so on.) Most of my filtering is done on those.

This was the exact string match, the regex match says it's not implemented in the version Ubuntu ships.

It would be nice if I could get balsa to check how many new messages are in each folder at startup time without me having to click on each folder before it notices "hey, my cached metadata is stale!" and reparsing the mbox file. But so far, I havent' found a way to get it to do that.
I have a combination of mbox and Maildir, and having Balsa check mail again (even if it doesn't actually find and download anything) seems to get it to recognize which mailboxes have new messages.

$ ls -l linux-kernel
-rw------- 1 landley landley 2229420857 Nov  1 11:39 linux-kernel
$ time cat linux-kernel > /dev/null
real	0m42.225s
user	0m0.080s
sys	0m5.292s

A full rescan of my mbox files understandably takes a little while. I don't mind so much (my setup is abusive to mail readers, yes), but I think I'd notice if it was doing it. :)

I need to move to the source version before bugging the list too much more about these. Alas, I haven't sat down and figured out why the build system wants a spellcheck library it shouldn't (and which I already installed), and how to chop that out.

A ./configure --without-spellcheck would be nice...

Rob
#!/usr/bin/python

import sys

def readinbox(inbox, handle):
  headers=None
  body=None
  msgs={}
  msglist=[]
  count=0

  for i in inbox:
    # Start of new message
    if i.startswith("From "):
      if body: handle(headers, body, count)
      headers=[i]
      body=None
      count=count+1
    elif not headers: continue

    # Accumulate headers and body lines
    elif body!=None: body.append(i)
    elif i!='\n':
      if headers and i[:1].isspace(): headers[-1]=headers[-1]+i
      else: headers.append(i)

    # Switch from headers to body
    else:
      hdrstr="".join(headers)
      # Discard duplicates (even List-id: must be the same)
      if hdrstr in msgs:
        headers=None
        print "dup %s@%s" % (msgs[hdrstr],count)
      else:
        msgs[hdrstr]=count
        body=[]
  if body: handle(headers, body, count)
  print

  return msgs,msglist

rules=[
  ("linux-kernel.vger.kernel.org", "linux/linux-kernel"),
  ("blfs-dev.linuxfromscratch.org", "lfs/blfs"),
  ("clfs-dev-cross-lfs.org", "lfs/clfs-dev"),
  ("clfs-support-cross-lfs.org", "lfs/clfs-support"),
  ("celinux-dev.lists.celinuxforum.org", "linux/celinux-dev"),
  ("toybox-landley.net", "mine/toybox"),
  ("users.linux.kernel.org", "linux/users"),
  ("crossgcc.sourceware.org", "lfs/crossgcc"),
  ("devicetree-discuss.lists.ozlabs.org", "linux/devicetree"), 
  ("buildroot.busybox.net", "package/buildroot"),
  ("busybox.busybox.net", "package/busybox"),
  ("uclibc.uclibc.org", "package/uclibc"),
  ("aboriginal-landley.net", "mine/aboriginal"),
  ("gentoo-embedded.gentoo.org", "lfs/gentoo-embedded"),
  ("lfs-dev.linuxfromscratch.org", "lfs/lfs-dev"),
  ("lfs-support.linuxfromscratch.org", "lfs/lfs-support"),
  ("containers.lists.linux-foundation.org", "linux/containers"),
  ("linux-doc.vger.kernel.org", "linux/linux-doc"),
  ("linux-embedded.vger.kernel.org", "linux/linux-embedded"),
  ("staff.texas.lonestarcon3.org", "zzz/lonestarcon"),
  ("lxc-devel.lists.sourceforge.net", "linux/lxc-devel"),
  ("lxc-users.lists.sourceforge.net", "linux/lxc-users"),
  ("neuros.googlegroups.com", "zzz/neuros"),
  ("qemu-devel.nongnu.org", "package/qemu-devel"),
  ("dropbear.ucc.asn.au", "package/dropbear"),
  ("user-mode-linux-devel.lists.sourceforge.net", "zzz/uml"),
  ("tinycc-devel.nongnu.org", "package/tcc"),
  ("tsgeeks.list.gerf.org", "zzz/tsgeeks"),
  ("v9fs-developer.lists.sourceforge.net", "linux/v9fs-developer"),
  ("v9fs-users.lists.sourceforge.net", "linux/v9fs-users"),
  ("mercurial.selenic.com", "package/mercurial"),

  ("owner-pcc-list ludd ltu se", "sender", "package/pcc")
]

def write_outbox(headers, body, count):
  output=None
  for i in headers:
    i=i.split(":", 1)
    if len(i)==2:
      low=i[0].lower()
      for j in rules:
        if len(j)==2:
          if low != "list-id": continue
        elif low != j[1]: continue
        if i[1].find(j[0])!=-1:
          output="mail/"+j[-1]
          break
    if output: break

  if not output: output="mail/filtered"

  print "write %s to %s" % (count, output)
  open(output, "aw").write("%s\n%s" % ("".join(headers), "".join(body)))

def stub(one, two, three):
  sys.stdout.write("pass %s\r" % three)
  sys.stdout.flush()
  pass

if __name__ == "__main__":
  msgs, msglist=readinbox(open("mail/inbox", "r"), write_outbox)
  #msgs, msglist=readinbox(open("mail/inbox", "r"), stub)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]