Re: simple questions
- From: Debajyoti Bera <dbera web gmail com>
- To: JM Ibanez <jm orangeandbronze com>
- Cc: dashboard-hackers <dashboard-hackers gnome org>
- Subject: Re: simple questions
- Date: Mon, 23 Apr 2007 09:46:32 -0400
> >> 4) This python/perl is needed because
> >> some of my e-mails are html-entity, quoted-printable, 8-bit,
> >> iso-8859-2, utf-8 encoded and so on.
> >> I guess this is not a problem for Beagle at all, ie it can search
> >> in any e-mail no mattter how it is encoded.
> >> What about the attached files?
> >
> > If you throw email files (containing single emails, maildir style) to
> > beagle, it knows how to index emails. Also beagle will take care of
> > the attachments itself. Sometimes there is a problem in determining
> > the mimetype of email files, instead of message/rfc822 they are
> > recognized as text files by our mime type sniffer. So, if you can
> > somehow ensure that the files that are sent to beagle have the
> > mimetype explicitly set to message/rfc822, beagle will correctly index
> > them for you.
>
> Interesting: does this happen implicitly within the file crawler
> (i.e. not the other backends), or explicitly via the other backends (say
> the KMailQueryable)? Because if it's implicit, might as well back out
> whatever I'm doing for the Gnus backend and work on MIME type
> detection. :)
[Long email warning]
(There is a wiki page which might be helpful
http://beagle-project.org/Architecture_Overview)
The tedious work of "extracting data" from the physical files (or embedded
files in files like attachments) is done by the drones aka Filters.
The work of "finding data" to index from who_knows_which_weird_location and
maintaining state of data of some application (e.g. which mails are deleted
in mbox-based Mail apps, deleted emails are not immediately deleted from the
disk) is done by smart agents aka Backends.
Though possible, rarely any Backend extracts indexable data from any file.
They merely set up a request to the filters to index a physical file (as you
have done in the Gnus backend). So, there is a generic Mail filter which can
index all message/rfc822 emails. This is used by the Files backend to index
any email messages it finds on the disk and by all the mail backends. The
reason specialized mail backends exist is because they want to maintain/index
some additional information not handled by the generic files backend.
Typically the mail backends use some app some information.
So, the files backend can perfectly index any email files. There are three
general problems with this:
1) Sometime mimetype detection misses a valid mail file and marks it as text.
We use xdgmime mimetype detection and the issue arises because some mail
clients add a different than expected first line in the mail message. xdgmime
checks the first line to detect mail message.
2) Once an email is found as a search result, what to do when a user clicks on
it ? If the emails come from kmail or evolution or t-bird backends,
beagle-search knows which application to open. For emails on the disk, I am
not sure what to do.
3) Sometimes additional email-client specific information could be useful
information. E.g. in the gnus backend you are writing, the information about
the folder name is something that the files backend will never be able to
give you. Also, if you are able to parse the .overview files (or whatever
other gnus specific state files), generally they contain useful information
which is useful to report to the end-user. The files backend will never be
able to report such information.
In a sense the mail backend is a specialized files backend. I would still
encourage you to work on a gnus backend. But if you want a quick working
solution, you can just use the files backend to index those email files. In
case beagle is not able to detect the mimetypes properly, you can get away
with writing an empty FilterGnus, subclassed from FilterMail, with
AddSupportedFlavor (new FilterFlavor ("file:///path/to/~/Mail/*", null,
null,1));
This will force all files in ~/Mail/* to be indexed using FilterGnus and thus
use FilterMail. Warning: If there is any non mail file in ~/Mail, the Mail
filter might crash! You have been warned.
- dBera
--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]