comm_failure for large sequences using giop between two machines



Hi,

I am getting COMM_FAILURE exceptions when I try and send large sequences 
(large messages, say a sequence of 100,000 doubles) using GIOP from a 
server to a client on separate machines. I am using ORBit 2.7.1. I have 
tracked the problem as far as giop_recv_buffer_get. Everything looks 
fine to begin with (ent->cnx is valid pointer). giop_thread_self returns 
a null pointer so the program takes the non-threaded branch. However, 
after performing a single iteration of linc_main_iteration ent-cnx is 
then set to null. So the loop exits and the function returns ent->buffer 
which is also null. orbit_small_demarshal then exits with a return code 
to signal an exception. The client debug trace:messages:giop output is 
as follows (with a few little additions to help me find where things are 
going wrong):

p58551 : ([140124e60])->new_set_variable_seq (Align = 28
Marshal: id 0x1fffb730
 't' : kind - 17, i 0,  'b' : kind - 17, i 1,  'len' : kind - 5, i 
0xd2f)Outgoing IIOP data:
0x0000:   47 49 4f 50  01 02 01 00  70 00 00 00  XX XX XX XX | 
GIOP....p...****
 ---
0x000c:   30 b7 ff 1f  03 00 00 00  00 00 00 00  1c 00 00 00 | 
0...............
0x001c:   00 00 00 00  7f 7c 87 b6  6c d1 6e d3  09 c0 6a d6 | 
.....|..l.n...j.
0x002c:   c3 35 e3 f1  01 00 00 00  cf e4 b0 18  15 00 00 00 | 
.5..............
0x003c:   6e 65 77 5f  73 65 74 5f  76 61 72 69  61 62 6c 65 | 
new_set_variable
0x004c:   5f 73 65 71  00 00 00 00  01 00 00 00  01 00 00 00 | 
_seq............
0x005c:   0c 00 00 00  01 01 01 01  01 00 01 05  09 01 01 00 | 
................
0x006c:   20 20 20 20  00 00 00 00  01 00 00 00  2f 0d 00 00 | 
............/...
 ---
 
giop_recv_buffer_get ent->cnx 1401d9480
perform linc_main_iteration
Incoming IIOP header:
0x0000:   47 49 4f 50  01 02 01 01  64 fb 13 00  XX XX XX XX | 
GIOP....d...****
 ---
!ent->cnx
No recv buffer ...
Sys exception incomplete on id 0x1fffb730
 
[System exception comm failure in ORBit_small_invoke_stub] )


Some of the server debug traces:messages output is :

p 9948 : ([0x816ec48])->new_set_variable_seq (0, 1, 0xd2f) =>; 
seq[3375]={ very large number of object references }

Skimming the huge giop trace at the server end, it looks like server is 
sending the message properly. Any insights into what is going wrong (or 
how to track it down) would be most appreciated.

-- 
Bowie Owens

CSIRO Mathematical & Information Sciences
phone  : +61 3 9545 8055
fax    : +61 3 9545 8080
mobile : 0425 729 875
email  : Bowie.Owens@csiro.au






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]