Re: Followup: opinions on Search services
- From: Manuel Amador <rudd-o amautacorp com>
- To: Miguel de Icaza <miguel ximian com>
- Cc: Joe Shaw <joeshaw novell com>, gnome-devel-list gnome org, "John \(J5\) Palmieri" <johnp martianrock com>, Jamie McCracken <jamiemcc blueyonder co uk>
- Subject: Re: Followup: opinions on Search services
- Date: Tue, 26 Apr 2005 18:02:43 -0500
You guys probably know better than me, how does Beagle do to watch so
many directories at once? I've resorted to using inotify, of course, as
well, but what I do is rather kludgy: once setting a dir watch fails, I
try and duplicate the inotify limit via /sysfs, then go on and retry the
operation. Unfortunately, while this actually lets me watch tons of
dirs (167.000 at last count, primarily due to my mp3 collection), I am
not sure whether this is actually a "bright idea" (TM).
El jue, 07-04-2005 a las 12:13 -0400, Miguel de Icaza escribió:
> > > Adding on to this, if one designs their programs correctly the actual
> > > call overhead is negligible. The only reason one would optimize by
> > > using a lower level language is if a block of code, usually in some sort
> > > of long running loop, is taking too long to finish. In that case most
> > > of the time is spent in the call itself rendering the overhead of making
> > > the call negligible.
> > The issue here is memory and the garbage collector rather than loops.
> > The Boehm GC is particularly slow at allocating large objects on the
> > managed heap and the resulting fragmentation causes both poor
> > performance (the GC spends an inordinate amount of CPU time searching
> > for free blocks) and excessive memory consumption.
> Those statements in general make sense, but they do not apply to Mono or
> Java using Boehm GC.
> The reason why this is not an issue with Mono/Java is because we use the
> "precise" framework of Boehm GC, where we explicitly register the types
> and layouts of objects allocated with it, so Boehm only scans the parts
> that actually can contain pointers instead of all the blocks (the
> default mode of execution).
> This has huge performance implications. You are correct that naive use
> of Boehm is in general an underperformer, but the situation changes
> drastically when employed as a precise GC.
> Boehm still presents problems, the major one is the lack of a
> compacting GC. This leads to a situation where you can fragment the
> heap. Very much in the same way that every C++ and C applications
> fragment the heap today.
> The situation could get bad if you allocate large blocks (multi-megabyte
> blocks) that you do not use and depend on the GC to free them. This
> problem can be fixed problem by assisting the GC (clear your variables:
> a = null) or use the Dispose pattern for large objects (this in fact was
> the major source of issues in Beagle).
> > Indexing large files requires dynamic allocation of large amounts of
> > memory hence my opinion that garbage collected languages are not optimal
> > for this situation. Im not a luddite and I do like both python and C#
> The above is not true, you only need a few buffers to index it.
> Let me illustrate with an example:
> "To index a 1 gigabyte file, do I need 1 gigabyte of memory?"
> Clearly if your answer is `yes', then you are not the most astute
> programmer, nor the sharpest knife in the drawer.
> > and I would certainly use them for GUI stuff over C anyday. However for
> > a back end service that is both CPU and memory intensive I maintain
> > that IMHO C in this particular case is a better choice.
> Luckily, your ideology does not match reality.
> As Beagle and the extensive set of applications built with Lucene in
> Java and .NET prove they are adequate languages for the task (and there
> is now this distributed open source search engine built with Java as
Manuel Amador <rudd-o amautacorp com>
] [Thread Prev