Re: simple questions

> >>   4) This python/perl is needed because
> >> some of my e-mails are html-entity, quoted-printable, 8-bit,
> >> iso-8859-2, utf-8  encoded and so on.
> >> I guess this is not a problem for Beagle at all, ie it can search
> >> in any e-mail no mattter how it is encoded.
> >> What about the attached files?
> >
> > If you throw email files (containing single emails, maildir style) to
> > beagle, it knows how to index emails. Also beagle will take care of
> > the attachments itself. Sometimes there is a problem in determining
> > the mimetype of email files, instead of message/rfc822 they are
> > recognized as text files by our mime type sniffer. So, if you can
> > somehow ensure that the files that are sent to beagle have the
> > mimetype explicitly set to message/rfc822, beagle will correctly index
> > them for you.
> Interesting: does this happen implicitly within the file crawler
> (i.e. not the other backends), or explicitly via the other backends (say
> the KMailQueryable)? Because if it's implicit, might as well back out
> whatever I'm doing for the Gnus backend and work on MIME type
> detection. :)

[Long email warning]

(There is a wiki page which might be helpful

The tedious work of "extracting data" from the physical files (or embedded 
files in files like attachments) is done by the drones aka Filters.
The work of "finding data" to index from who_knows_which_weird_location and 
maintaining state of data of some application (e.g. which mails are deleted 
in mbox-based Mail apps, deleted emails are not immediately deleted from the 
disk) is done by smart agents aka Backends.

Though possible, rarely any Backend extracts indexable data from any file. 
They merely set up a request to the filters to index a physical file (as you 
have done in the Gnus backend). So, there is a generic Mail filter which can 
index all message/rfc822 emails. This is used by the Files backend to index 
any email messages it finds on the disk and by all the mail backends. The 
reason specialized mail backends exist is because they want to maintain/index 
some additional information not handled by the generic files backend. 
Typically the mail backends use some app some information.

So, the files backend can perfectly index any email files. There are three 
general problems with this:
1) Sometime mimetype detection misses a valid mail file and marks it as text. 
We use xdgmime mimetype detection and the issue arises because some mail 
clients add a different than expected first line in the mail message. xdgmime 
checks the first line to detect mail message.
2) Once an email is found as a search result, what to do when a user clicks on 
it ? If the emails come from kmail or evolution or t-bird backends, 
beagle-search knows which application to open. For emails on the disk, I am 
not sure what to do.
3) Sometimes additional email-client specific information could be useful 
information. E.g. in the gnus backend you are writing, the information about 
the folder name is something that the files backend will never be able to 
give you. Also, if you are able to parse the .overview files (or whatever 
other gnus specific state files), generally they contain useful information 
which is useful to report to the end-user. The files backend will never be 
able to report such information.

In a sense the mail backend is a specialized files backend. I would still 
encourage you to work on a gnus backend. But if you want a quick working 
solution, you can just use the files backend to index those email files. In 
case beagle is not able to detect the mimetypes properly, you can get away 
with writing an empty FilterGnus, subclassed from FilterMail, with
AddSupportedFlavor (new FilterFlavor ("file:///path/to/~/Mail/*", null, 

This will force all files in ~/Mail/* to be indexed using FilterGnus and thus 
use FilterMail. Warning: If there is any non mail file in ~/Mail, the Mail 
filter might crash! You have been warned.

- dBera

Debajyoti Bera @
beagle / KDE fan
Mandriva / Inspiron-1100 user

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]