Re: Followup: opinions on Search services



Hello,

> > Adding on to this, if one designs their programs correctly the actual
> > call overhead is negligible.  The only reason one would optimize by
> > using a lower level language is if a block of code, usually in some sort
> > of long running loop, is taking too long to finish.  In that case most
> > of the time is spent in the call itself rendering the overhead of making
> > the call negligible.
> 
> The issue here is memory and the garbage collector rather than loops. 
> The Boehm GC is particularly slow at allocating large objects on the 
> managed heap and the resulting fragmentation causes both poor 
> performance (the GC spends an inordinate amount of CPU time searching 
> for free blocks) and excessive memory consumption.

Those statements in general make sense, but they do not apply to Mono or
Java using Boehm GC.

The reason why this is not an issue with Mono/Java is because we use the
"precise" framework of Boehm GC, where we explicitly register the types
and layouts of objects allocated with it, so Boehm only scans the parts
that actually can contain pointers instead of all the blocks (the
default mode of execution).   

This has huge performance implications.  You are correct that naive use
of Boehm is in general an underperformer, but the situation changes
drastically when employed as a precise GC. 

Boehm still presents problems, the major one is the lack of a
compacting GC.  This leads to a situation where you can fragment the
heap.  Very much in the same way that every C++ and C applications
fragment the heap today.

The situation could get bad if you allocate large blocks (multi-megabyte
blocks) that you do not use and depend on the GC to free them.  This
problem can be fixed problem by assisting the GC (clear your variables:
a = null) or use the Dispose pattern for large objects (this in fact was
the major source of issues in Beagle). 

> Indexing large files requires dynamic allocation of large amounts of 
> memory hence my opinion that garbage collected languages are not optimal 
> for this situation. Im not a luddite and I do like both python and C# 

The above is not true, you only need a few buffers to index it.

Let me illustrate with an example:

	"To index a 1 gigabyte file, do I need 1 gigabyte of memory?"

Clearly if your answer is `yes', then you are not the most astute
programmer, nor the sharpest knife in the drawer.

> and I would certainly use them for GUI stuff over C anyday. However for 
> a back end service that is  both CPU and memory intensive I maintain 
> that IMHO C in this particular case is a better choice.

Luckily, your ideology does not match reality.

As Beagle and the extensive set of applications built with Lucene in
Java and .NET prove they are adequate languages for the task (and there
is now this distributed open source search engine built with Java as
well).

Miguel.

Miguel



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]