Re: InfoQ article about beagle



Hey,

On 7/24/07, Pierre Östlund <pierre ostlund gmail com> wrote:
It seems like we have an article at infoq.com just around the corner. James
Vastbinder (.NET editor over at InfoQ as well as ISV Platform Strategy
Advisor over at Microsoft) made a comment on my blog about this. I contacted
him to validate this. He really wanted to get the hold of the project leader
and since I'm not that person I recommended him to put this out on this
list. He asked me to do this on behalf of him, so that's what I'm doing
right now.

Ah, thanks for forwarding this along.

These are basically the questions that he wants answers to:

I figure I'll answer these on the list for the information of the
other subscribers. :)

* What prompted the creation of Beagle?

Beagle really grew out of a project called Dashboard
(http://nat.org/dashboard) that I worked on with Nat Friedman, Alex
Graveley, Jim Krehl and others primarily in the summer of 2003.  The
idea behind Dashboard (which predated Apple's own very different
Dashboard) was that the computer knows what you're doing at any given
time -- reading emails, IMing with a friend, working on a document --
so it should be able to show information relevant to what you're doing
right then.  A lot of the work we were doing was people-based: if I
was IMing with Nat, it would show me Nat's latest blog entries, or the
last few emails I received from him.  It'd show me his email address
and IM nickname and phone number.  And if he typed someone else's
phone number in the window, it would look it up and provide me info
there too.

In doing Dashboard we found two real deficiencies of our platform: (1)
data (and metadata) was difficult to access for a variety of reasons
and (2) we were losing vast amounts of metadata which established
relationships between various pieces of data.

Beagle was largely created to address problem number 1.  It would
index textual content and metadata so that it could be efficiently
searched and applications only had to go to a single location to find
it and quickly retrieve it.

* How is development of Beagle funded?

From the beginning Novell has funded at least one person full-time to
developing it.  Initially it was Jon Trowbridge.  A little bit later
Dave Camp and I were part-time contributors to it.  After Dave went to
work on Hula I became a full time contributor on it with Jon, and when
Jon left last January it was myself alone.  For a period there Dan
Winship was also working on it part-time with me.

In addition to that, Google has indirectly funded development the this
summer and last summer through its Summer of Code program.

But of course, Beagle's strengths are in that it's an open source
project.  Large amounts of effort have been provided by individual
contributors, and Beagle would not be possible without them.  A little
over a year ago I listed all of the contributors up to that point, and
it numbered over one hundred:

   http://joeshaw.org/2006/04/29/400

* What is the current status of Beagle?

At this point Beagle is by far the most featureful, usable desktop
search system on Linux today.  We support over 20 data sources (file
system, email, IM logs, etc.) and over 60 data formats (MS Office,
ODF, PDF, MP3, etc.) which I think is the most of any desktop search
system on any operating system.

We're shipped on most Linux distributions and some of them integrate
Beagle pretty deeply in the desktop experience.

As for the project itself, we're working toward a 0.3.0 release -- a
major upgrade from our 0.2.x series -- which will feature faster
indexing, more complete indexing of archive contents, better support
for externally stored metadata like tags and annotations, etc.

* What is it like competing with Google Desktop and MS Desktop Search?

Well, MS Desktop Search doesn't run on Linux and Beagle doesn't run on
Windows (yet), so I don't even see them as competitors.

Google Desktop just came out for Linux and although it indexes Gmail
(and we don't... yet) it lacks the wide coverage Beagle has.  It
doesn't index IM conversations or integrate well with mail clients
other than Thunderbird.  It taxes the system while it indexes and has
no integration with existing desktop applications.  Not being open
source, that fundamentally limits its ability to be extended to
support new and existing data types and means that it'll never achieve
tight integration in the Linux desktop.  Beagle's permissive open
source license is a strength in this area.

GDL has some nice features: it seems to do some sort of version
control and storing of cached data; it handles plain mailbox files on
disk nicer; and it supports indexing of Gmail, but none of these are
radical features that Beagle can't implement.

* Who is the target user of Beagle?

Beagle targets both users and developers.  For developers, we provide
some really nice APIs for extending the types of data Beagle can index
and then searching those indexes.  This means that developers can
integrate index and search into their applications, or build entirely
new user interfaces around search.

For users, the goal is simply to make it easier to find your data.
The file system is a fairly arcane metaphor that users have to deal
with, and in many cases people simply ignore it.  They just dump all
their files into their Documents folder.  I do this to an extent
myself; everything I download goes into a special folder, things pile
up over time, and then it's impossible to extract a needle from the
haystack.  Then you have things like email that abstract away the
storage (either on the file system or a server) but only allow you to
access the mail through the email program.  Ditto for addressbook
contacts or calender events.  Until recently on Linux, there was no
user accessible (non-command line) way to access IM chat logs at all.
Web pages are cached by your browser but essentially inaccessible to
you.

Beagle solves these problems by making them all readily and easily
accessible through a graphical search interface.  You don't need to
navigate a folder hierarchy anymore.  You don't need to go one-by-one
through a list of files in a directory trying to remember what you
named that document.  Your emails and IM logs and RSS feeds and web
history and addressbook contacts are right there alongside files.

Of course, that's the idealistic view.  Some people, like me, are just
disorganized and a tool like this helps me.  Some people are highly
organized, love folders, and desktop search might be completely
superfluous to them.  That's fine, it's not for everybody.  In the
future, however, we might see some really innovative applications
built on top of desktop search that can benefit even these organized
individuals, like the Dashboard project I mentioned earlier.

* What are the futures for Beagle?

There is always more data to index, performance optimizations to make,
etc.  That's the boring future. :)

Beyond that, we're looking at adding networked searches, so that
you'll be able to run searches against several machines.  We'd like to
use Zeroconf here with multicast DNS and service discovery to be able
to search machines on your local network without needing any
configuration.  Another potential feature is automatically determining
what language a document is by doing some statistical analysis on
it... we have patches floating around for that.

I'd like to see the platform on the Linux desktop expand to do
del.icio.us-style tagging of any piece of data -- files, emails, web
pages -- and make that data available for Beagle to index.  I'd like
to see applications evolve so that they stop siloing their data and
make it more available to other applications, including Beagle.  I'd
like to see applications storing implicit relationships between data
-- when I save an email attachment, store the relationship of that
file on disk to the person who sent it to me -- and make that
available to Beagle for indexing.  I'd like to see more apps use
Beagle internally as their search mechanism.  None of these are
necessarily changes to Beagle itself but how we can broadly improve
the user experience.

Hope this is good. :)

Joe


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]