Re: what files does beagle index?

On Wed, 2006-07-12 at 16:19 -0700, D Bera wrote:
> By default, beagle tries to index every file under your home-dirctory,
> _except_ dot-dirs. Files and subdirs under .dir wont be indexed.

Ahhh.  Perhaps some of these files to which I am referring are under
dot-dirs.  How about a flag to toggle that?

> Its hard to extract data from binary files.

How about "strings" them first?

> I am not even sure how to
> extract all words from a db file.

Strings does a pretty decent job.

> But anyway, beagle relies on a huge
> collection of filters to extract data from various types of files.

Right.  I had gathered that.

> The
> filters in beagle cover nearly all the possible formats from which
> data extraction is possible e.g. html, doc, comments from jpeg. There
> is no filter for 'binary db' files as of now; hence beagle would
> ignore them.

How about a "default" for binaries that simply does do just "strings"?

> Similar to the way you examine Google's index and see what webpages
> are in the index :)

Maybe beagle does something similar but if not, i think my touche is

:-)  But that is not even really apples to apples.  If I had Googles
database, like I have Beagles, I probably could do exactly what I mean.

> Jokes aside, the recommended way to examine if a file is indexed is to
> query for the filename. Put the whole name in quotes and you should
> get it in the results.

Yeah, cool!  So a manual search does indicate that a given file I am
thinking of is indeed in a dot-dir.  :-(  And it is a .db file which
file says:

$ file .icq.old/history/6000006.db
.icq.old/history/6000006.db: GNU dbm 1.x or ndbm database, little endian

$ strings .icq.old/history/6000006.db
b ssage throug
Hi! I found them!

(Apart from being spam) Obviously useful information in those files,
even filtered through "strings".  Aside from being in a dot-dir it would
be nice that Beagle could give me this.


My other computer is your Microsoft Windows server.

Brian J. Murrell

Attachment: signature.asc
Description: This is a digitally signed message part

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]