Re: [Tracker] miner-fs: Placing monitors on directories takes way too much time



Hi hi,


What do you mean with UPDATEs being merged?

I checked pool->sparql_buffer->len. If it is greater than 1, it means several sparqls are merged. 

Hum, yes, that's the idea actually; I don't understand why you say it's
a regression. CREATED and UPDATED events will get merged in the SPARQL
buffer, and the buffer will be flushed (commited to the store) if any of
these conditions is satisfied:
 (a) The file corresponding to the task pushed doesn't have a parent.
 (b) The parent of the file corresponding to the task pushed is
different to the parent of the last file pushed to the buffer.
 (c) The limit for "PROCESS" tasks in the pool was reached (=100 for
miner-files)
 (d) The buffer was not flushed in the last MAX_SPARQL_BUFFER_TIME (=15)
seconds.



From my view, more items in sparql_buffer means better performance. 


Yes, but we are currently limiting the items in the buffer to have all
the same parent (all files in the same directory). If during crawling or
event processing, different files from different directories are
updated, they won't get merged in the same SPARQL connection. This
constraint can probably me removed, but not sure how it would affect the
overall logic in the miner-fs.

Cheers!

-- 
Aleksander




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]