socketpath generation



Hello ORBitters,

When using ORBit, I ran into a funny problem the other day. I
have a community of identical processes, each on a different 
machine. These processes are node-bound servers, communicating
with one another. The processes talk to eachother using ORBit.
Until recently, things would sooner or later come to a mysterious 
halt...

All these processes are started simultaneously. So, they go to their
initialization-routines at (almost) the same time. During this 
initialization, each process gets a local and a network-bound socket,
which are both stored in the IOR.

To determine the socket-path for the local socket, a strategy like:

     srand(time(NULL));
     ...
     sprintf(socketpathvar,"<std-prefix>%s%s", rand(), rand());

is employed. Here's were the problem starts. Almost always, the call
to `time(NULL)' will yield the same result on a couple of hosts.
time() has a granularity of 1 second, so it is not hard to see why.
This means that the calls to rand() will give the same result for
the various processes.

Some of you may already see the disaster developing ahead...

Suppose one of the processes wants to talk to another. It retrieves
the IOR of the remote process, and finds a local socket-path in the
IOR, which (also) exists on the local host.
This local connection is preferred over the network-bound connection,
and the process ends up trying to talk to (in this case) itself...

The way to solve this, and make unique socket-paths, is to incorporate
at least the hostname in the socket-path. The implementation I present
to you, features a little bit more elaborate socketpath-generation,
which yield socket-paths of the form:

"/tmp/orbit-<userid>/orb-<hostid>_pid_<process-id>_random_<random>"

This way you can see which process is using which socket path. That 
is kind of neat when debugging the communication between processes.
I send you am improved(?) ORBit_ORB_make_usock_connection() which is
in "orb.c".

I hope this solves some of the problems out there, at least it solved
(some of)mine.
I also hope the code I send is acceptable for you, otherwise feel free
to modify it as you see fit. (Sorry, but I'm not able to make a patch)

I use ORBit-0.5.1 on a network of Solaris, IRIX and Linux machines;
I checked, and ORBit-0.5.3 has the same malfunction.

Keep up the good work,

Regards,

  Bernard

-- 
Bernard van Veelen                

     				  Thomson CSF Signaal
     				  Applied Systems Research
 
     				  email :  veelen signaal nl
............................................[ Unclassified ]...
/*
   static GIOPConnection*
   ORBit_ORB_make_usock_connection(void)
   
   
   Previously used strategy:
   
     srand(time(NULL));
     ...
     sprintf(socketpathvar,"<std-prefix>%s%s", rand(), rand());
     
   Problem:
     I had a problem with the way the local socketpath was determined.
     When starting a lot of identical processes on different machines
     simultaneously, there are (almost) always some yielding the same
     result for `time(NULL)'. These processes then initialize the 
     random seed with the same value. 
     When thus initialized processes try to `talk to eachother',
     they find a local socketpath in the IOR of another process,
     and they end up trying to talk to themselves.
     
   Solution: Incorporate some form of host identification and process
     identification in the socketpath.
     
   Implementation: the way I implemented it you can see which process
     is using which socket path.
     
						Sep 18, 2000,
						Bernard van Veelen,
						veelen signaal nl
     
*/       


static GIOPConnection*
ORBit_ORB_make_usock_connection(void)
{
	GIOPConnection *retval = NULL;
	GString *tmpstr;
	struct stat statbuf;
	char hn[32];
	
	tmpstr = g_string_new(NULL);

	g_string_sprintf(tmpstr, "/tmp/orbit-%s", g_get_user_name());
		
	if(mkdir(tmpstr->str, 0700) != 0) {
		int e = errno;
			
		switch (e) {
		case 0:
		case EEXIST:
			if (stat(tmpstr->str, &statbuf) != 0)
				g_error ("Can not stat %s\n", tmpstr->str);

			if (statbuf.st_uid != getuid ())
				g_error ("Owner of %s is not the current user\n",
					 tmpstr->str);

			if((statbuf.st_mode & (S_IRWXG|S_IRWXO))
			   || !S_ISDIR(statbuf.st_mode))
				g_error ("Wrong permissions for %s\n",
					 tmpstr->str);
			break;
				
		default:
			g_error("Unknown error on directory creation of %s (%s)\n",
				tmpstr->str, g_strerror (e));
		}
	}

	{
		struct utimbuf utb;
		memset(&utb, 0, sizeof(utb));
		utime(tmpstr->str, &utb);
	}


#ifdef WE_DONT_CARE_ABOUT_STUPID_2DOT0DOTX_KERNELS
	g_string_sprintf(tmpstr, "/tmp/orbit-%s",
			 g_get_user_name());
	dirh = opendir(tmpstr->str);
	while(!retval && (dent = readdir(dirh))) {
		int usfd, ret;
		struct sockaddr_un saddr;

		saddr.sun_family = AF_UNIX;

		if(strncmp(dent->d_name, "orb-", 4))
			continue;

		g_snprintf(saddr.sun_path, sizeof(saddr.sun_path),
			   "/tmp/orbit-%s/%s",
			   g_get_user_name(), dent->d_name);

		usfd = socket(AF_UNIX, SOCK_STREAM, 0);
		g_assert(usfd >= 0);

		ret = connect(usfd, &saddr, SUN_LEN(&saddr));
		close(usfd);

		if(ret >= 0)
			continue;

		unlink(saddr.sun_path);
	}
	closedir(dirh);
#endif /* WE_DONT_CARE_ABOUT_STUPID_2DOT0DOTX_KERNELS */

	srand(time(NULL));
	while(!retval) {

 		if (gethostname(hn, 32))
 		   g_error("Cannot find my own hostname.\n");
 		else
 		   g_string_sprintf(tmpstr, "/tmp/orbit-%s/orb-%s_pid_%u_random_%i",
 				       g_get_user_name(), hn, getpid(), rand());
		retval =
			GIOP_CONNECTION(iiop_connection_server_unix(tmpstr->str));
	}

	g_string_free(tmpstr, TRUE);

	return retval;
}



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]