Beagle 0.1.0



I'm pleased to announce the release of Beagle 0.1.0.

When you've been working on a project for a while, it is easy to lose
perspective.  And I recognize that some might accuse me of being biased.  But
still, I am confident that this Beagle release is one of the most important
events in all of human history.  I'm happy to let future historians bicker
about the precise details of whether it is more or less important than, say,
the discovery of penicillin or the Napoleonic Wars... but let there be no
doubt: the age of the Beagle has arrived.

The list of enhancements in this version of Beagle is daunting --- the only
way to truly appreciate the scope of this release is to read the detailed
notes below.  Highlights include:
  * Faster indexing
  * Substantially improved querying
  * A smarter file system backend
  * More and better filters
  * UI improvements
  * Better APIs for client applications
  * A huge number of bug fixes
  * An adorable cartoon dog who fights crime
Well, there was a small problem with the adorable cartoon dog, so he didn't
quite make it into this release.  Maybe next version.  But you are going to
love the little guy --- he's cute as a button!

The only way to fully appreciate it is to run it.  Use it.  Live it.  And yes,
there are bugs --- and who among us can claim that our software is perfect and
hassle-free?  Even this milestone in the human struggle for enduring happiness
and prosperity has some glitches.  The nastier ones are described at the end
of these notes, but there are certainly all sorts of little annoyances lurking
just below the surface.  So please don't integrate Beagle into the space
shuttle's navigation system quite yet.  But if you find a bug, file it at
bugzilla.gnome.org.  We'll fix it.  Honest.


OUR MANY URLS
-------------

To download the 0.1.0 tarball or learn more, visit the Beagle wiki at:
http://www.beagle-project.org

Joe Gasiorek writes a Beagle newsletter.  You can read it at:
http://www.beagle-project.org/Newsletter

The latest gossip is available at:
http://www.planetbeagle.org

Nat Friedman made some cool movies that demonstrate Beagle in action:
http://nat.org/demos

We still talk about Beagle on the dashboard-hackers mailing list:
http://mail.gnome.org/mailman/listinfo/dashboard-hackers

Anthony Trollope, the great (and alarmingly prolific) Victorian novelist,
was also the inventor of the free-standing mailbox:
http://en.wikipedia.org/wiki/Anthony_Trollope


WHAT IS BEAGLE?
---------------
 
Beagle is a tool for indexing and searching your data.  Beagle is improving
rapidly on many fronts, and should work well enough for everyday use.
 
The Beagle daemon transparently monitors your data and updates the index
to reflect any changes.  On an inotify-enabled system, these updates happen
more-or-less in real time.  So for example,
 
* Files are immediately indexed when they are created, are re-indexed
  when they are modified, and are dropped from the index upon
  deletion.
* E-mails are indexed upon arrival.
* IM conversations are indexed as you chat, a line at a time.
 
Beagle uses the Lucene indexing system from the prodigious Doug
Cutting.

Best is a graphical tool for searching the index that the daemon creates.
Best doesn't query the index directly; it passes the search terms to the
daemon and the daemon sends any matches back to Best.  Best then renders the
results and allows you to perform useful actions on the matching objects.

Indexing your data requires a fair amount of computing power, but the Beagle
daemon tries to be as unobtrusive as possible.  It contains a scheduler that
works to prioritize tasks and control CPU usage, based on whether or not
you are actively using your workstation.


DEPENDENCY HECK
---------------

Beagle has many dependencies, and thus can be difficult to compile.
It requires:
* Mono 1.1.7 or better, along with the full Mono stack
* gtk-sharp 1.9.5 or better
* Gecko-sharp 2.0
* Gmime 2.1.16
* Libexif 0.5.7 or better

For the best possible Beagle experience, you should also have:
* Evolution-sharp 0.10.2
* A *patched* wv 1.0.3 --- the patch is available from
  http://users.avafan.com/~fredrik/beagle/wv-libole2-readonly.patch
* An inotify 0.24-enabled kernel.  Inotify is in the mainline Linux
  kernel as of 2.6.13.


CHANGES SINCE 0.0.12
--------------------

Daemon/Infrastructure:
* New and vastly improved Lucene infrastructure. Highlights include:
  + The indexing and querying code has been split.  The entire
    architecture is much cleaner, and we have been able to eliminate
    several horrible hacks that had been introduced for the benefit
    of the file system backend.
  + The index is now semi-transactional; we can make now make some
    guarantees about the index and the beagle-specific filesystem
    metadata not getting dangerously out-of-sync in the event of a
    crash.
  + It is now possible to efficiently modify certain index properties.
  + Hits now are returned in time-order, not score-order.  Scores
    are still available, though will not agree with the old scores
    in some cases.
  + Querying is faster, our query language is better, and the query
    semantics are much more sane.
  + We are much more intelligent about when we optimize the Lucene indexes.
  + Indexables are now abstract indexing requests, and IndexableGenerators
    can thus now manipulate the index in arbitrary ways.
  + Many, many bugs have been fixed.
  (Jon Trowbridge, Joe Shaw, Fredrik Hedberg, Daniel Drake)
* Many bugs involving queries have been fixed, and the query semantics have
  changed.  Querys now default to 'AND', not 'OR'. (Jon, Joe)
* Fixes to avoid stop words ("a", "the", etc.) causing problems during
  queries. (Joe)
* When using a remote home directory, create the unix domain socket
  locally since some filesystems don't support them (like smbfs). (Joe)
* When capping the number of hits returned from the daemon, we now
  return the N most recent hits, not the N highest-scoring hits. (Jon)
* Don't reload config files that haven't changed since we last loaded them.
  (Daniel)
* Reduce inotify watch masks when we can. (Daniel)
* Maintain parent-child relationships when watches are moved. (Daniel)
* Fix a crasher in libbeagle related to using g_stat(). (Joe)
* Update to gtk-sharp 2.x APIs. (Joe, Lluís Pàmies)
* Add an --enable-xml-dump configure option for debugging message
  passing between components. (Joe)
* Many fixes to libbeagle to keep it up-to-date with the C# code. (Joe)
* Fixed encoding of keyword property names in Lucene. (Debajyoti Bera)
* Use gnome-vfs's slow mime sniffing on files with the .xml extension.
  (Chris Lahey)
* Local index-synchronization when using NFS & Samba. (Fredrik, Joe)
* Support IO priorities by setting the indexing thread to idle. (Fredrik,
  Robert Love)
* Workaround a bug in Mono.Posix.Syscall.readdir to support strange
  encodings. (Fredrik)
* Use transactions more efficiently in our sqlite databases. (Jon)
* Don't poll to check for Inotify.Stop calls while snarfing events. (Jon)
* Don't leak files in /tmp on exceptions or daemon shutdown. (Jon)
* Scheduler improvements and bug fixes.  Tasks are now round-robined
  correctly. (Jon)
* Pass a stemmed, stopword-free version of the query back to the client
  along with the query results, for use by the client app when
  highlighting matches, etc. (Jon)

Backends:
* Migrated all of the backends to the new lucene infrastructure.  (Joe,
  Fredrik, Daniel, Jon)
* The file system backend has been extensively refactored.  It still isn't
  perfect, but has been substantially improved. (Jon, Daniel, Joe, Fredrik) 
* Allow file system exclude patterns/paths to be set through
  beagle-config. (Daniel)
* Misc filesystem backend fixes to do the right thing when we add/remove
  exclude patterns/paths. (Daniel)
* Fix removing of file system roots. (Daniel)
* Merge the concepts of 'scan' and 'crawl' in filesystem backend. (Daniel)
* Rewrite Evolution Data Server backend to support all local
  addressbooks and calendars with change notification, based on Varadhan's
  work in evolution-sharp. (Joe)
* Fix a major bottleneck in starting up the mail backend with IMAP
  accounts that would use large amounts of CPU at startup. (Joe)
* Fix a bug where IMAP accounts against Exchange servers weren't being
  indexed. (Joe)
* Index all IM accounts, not just the first one of each type, in the
  Evolution Data Server backend. (Joe)
* Support snippets, text-caching and hit filtering on static-indexes.
  (Fredrik)
* Initial Kopete support. (Fredrik)
* Better cache buddy lookups with Gaim and Kopete. (Fredrik)
* More gracefully handle the case of imap folders with unknown accounts. (Jon)

Filters:
* Cleanup the filtering code. (Fredrik)
* Add support for Apple/mp4 files (m4a/m4p). (Daniel)
* Index both id3v1 and id3v2 tags in mp3 files. (Daniel)
* Fixed small bugs where some tag fields weren't being read from some audio
  formats. (Daniel)
* Add support for Amiga tracker audio files (s3m, it, mod, xm). (Daniel,
  Boris Peterbarg)
* Don't leave zombie processes when indexing PDF and spreadsheet
  files. (Daniel)
* Lower the priority class when calling pdfinfo, so that it doesn't
  dominate the CPU on expensive runs (Paul Betts)
* Fix filtering of gaim logs. (Jon, Joe)
* Index names in emails as text, not keywords. (Joe)
* Fix serious filtering bottlenecks when indexing large documents
  (mostly from OpenOffice). (Joe, Jon, Fredrik)

UI/Tools:
* Arguments might be part of a command-line, so split them out before
  executing a tile action. (Joe)
* Allow best to launch commands which are 'quoted'. (Daniel)
* Add a mozilla preference which enables/disables the plugin in a
  persistent way. (Joe)
* Allow beagle-manage-index to work on versioned indexes. (Daniel)
* Flac files are now displayed using the flac tile. (Daniel)
* Maildir files are now displayed using the mail tile. (Daniel)
* Don't use evolution to open non-evolution mails in the mail tile. (Daniel)
* Use the title of the document as the main link in a file tile if set.
  Especially nice for HTML files. (Joe)
* Don't try to put presence on the mail tile if galago support is on,
  but evolution-sharp isn't. (Joe)
* Fix a date bug where some results would display as happening "-N days
  ago". (Joe)
* Thumbnail file hits. (Debajyoti, Joe)
* Don't run the crawler as part of cron.daily, because it can happen at
  any point during the day and it's an expensive operation.  Add a crontab
  entry which runs at 4:30 am. (Joe)
* Update the Best keybinding code from Tomboy.  Fixes many bugs related
  to different modifiers (num lock, scroll lock, alt, etc.) and fixes many
  "BadValue" crashers. (Alex Graveley, Joe)
* Updated tray icon code from libegg.  This fixes rendering glitches and
  other problems with the best tray icon. (Rodrigo Moya)
* Set IO priorities in beagle-build-inidex and beagle-manage-index. (Joe)
* Have beagle-build-index restart itself if it detects its memory usage
  is getting too high, and add a command-line option to disable that
  behavior. (Joe)
* Fix the Firefox extension from popping up error dialogs left and right
  if ~/.beagle doesn't exist. (Joe)
* beagle-extract-content fixes. (Debajyoti)
* Fixed a security problem in beagle-crawl-system. (Gary Ekker)
* Crawl documentation with the .docbook extension. (Chris)
* System-wide crawler indexing documentation, windows partitions etc.
  (Fredrik)
* Various "desktop-launch" fixes to support KDE in SUSE. (Fredrik)

Web Services:
* Improved Web Interface UI including background color for search box and
  drop-down in Web interface. (Vijay KN)
* Added new option in WebInterface to selectively enable/disable
  NetworkedBeagle search on per-query basis, when NetworkedBeagle nodes are
  configured.  (Vijay)
* Support for display of Images in web interface, when hits are images. (Vijay)
* Hit results returned for external accesses now restricted to File type
  resources alone.  Removed duplicate definitions of Web Service data types.
  (Vijay)

Translations:
* Added Finnish translation. (ituohela)
* Added Macedonian translation. (Арангел Ангов, strojmir)
* Added Ukrainian translation. (Maxim Dziumanenko)
* Updated Bulgarian translation. (Rostislav Raykov)
* Updated British English translation. (Christopher Orr, James Ogley)
* Updated Canadian English translation. (Adam Weinberger)
* Updated Catalan translation. (jmas)
* Updated Chinese translation. (Chao-Hsiung Liao)
* Updated German translation. (Hendrik Brandt)
* Updated Greek translation. (pkst)
* Updated Hungarian translation. (Gabor Kelemen)
* Updated Norwegian translation. (Terance Sola)
* Updated Spanish translation. (Francisco Javier F. Serrador)
* Updated Vietnamese translation. (clyties)

Everything Else:
* Added bludgeon, a testing tool. (Jon)
* Fix a compile error that popped up when mono's compiler rules became
  more strict. (Joe)
* All the stuff I forgot. (All of the people I forgot)


KNOWN ISSUES
------------

The file system is now much more robust than ever before.  However, there
are still race conditions that can occur with certain combinations of
file system operations.  In some cases it might be necessary to stop and
restart the daemon.

Extreme spikes in memory usage have been observed in some cases.  Certain
extremely large documents (particularly large HTML files) can temporarily
degrade your system's performance while they are being indexed.  In most cases
of these cases, the memory is reclaimed by the system relatively quickly after
the document is indexed.  There are other still-unexplained cases of excessive
memory use, in particular on SMP systems.

At this point in development, we cannot commit to stable APIs or file formats.
You will almost certainly need to delete your indexes and start again at some
point in the future.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]