problem with NFS, gconfd and shutting down



I have recently installed RH 7.3 on both my server and workstations.  My home
directories are served via NFS.

Situation 1:

I log in as a normal user through the gnome X interface.  I work for
a while, then log out.  In the process, I ask for a reboot.    If I
am lucky, I get messages that /home is busy and cannot be unmounted.
Usually there is just a tty with a login prompt and the system hangs
for a couple of minutes, and then right before starting the reboot I
get the error messages:

``lockd: cannot unmonitor <IP of server>'' and then it reboots.

as well as messages about RPC error no. 101 and a similar portmap error.

In the message log on the server I find:

Jun  6 12:48:13 cs kernel: lockd: rejected NSM callback from 7f000001:32789
Jun  6 12:48:13 cs rpc.statd[561]: recv_rply: [127.0.0.1] RPC status 5 

Both of  ~/.gconfd/lock and ~/.gconf/%gconf-xml-backend.lock exist and 
there is a .nfs file and an ior file.

Situation 2:

I remove the lock directories and then log in as a normal user through the
X interface.  Work for a while then log out.  Some time later
(after the gconfd-1 process has finished), I ask for the system to reboot 
from the log in screen.  The system drops to a tty and gives the standard
list of shutdown messages.  Everything is OK, however I still get the lockd
message from Situation 1.  Furthermore, the two lock directories have been
removed.

Situation 3:

Same as Situation 2, except I don't remove the lock directories.

During reboot the following message appears in the server's message log:

Jun  6 12:56:19 cs rpc.statd[561]: Can't callback <server name>
(100021,4), giving up.

While logged in, the lock directories are there, with another .nfs file.
The reboot behavior is the same as in Situation 2.

Furthermore, if I log in again, I get another .nfs file in each of the lock
directories.  This  .nfs (and ior) file goes away when gconfd-1 terminates 
on the workstation.

Observation 1:  

The netfs scripts are configured in a peculiar way.  First, it unmounts
all nfs filesystems using the -f option, which I believe does not
unmount /home.  Then later, it does a /sbin/fuser -k -m /home, which,
I believe, kills the gconfd-1 process.  But, for whatever reason, it
leaves the .nfs files and the lock directories around.


Partial Solution 1:

So, I tried the following, by hand (while gconfd-1 was still running):

1.  umount -a -t nfs
2.  /sbin/fuser -k -m /home
3.  umount -a -f -t nfs 

Next I changed /etc/rc.d/netfs on line 105 by removing the -f option.  In
that area it now reads:

		while [ -n "$remaining" -a "$retry" -gt 0 ]
		do
			if [ "$retry" -lt 3 ]; then
				action $"Unmounting NFS filesystems (retry): " umount -f -a -t nfs
			else
				action $"Unmounting NFS filesystems: " umount -a -t nfs
			fi
			sleep 2
			remaining=`awk '!/^#/ && $3 ~ /^nfs/ && $2 != "/" {print $2}' /proc/mounts`
			[ -z "$remaining" ] && break
			/sbin/fuser -k -m $sig $remaining >/dev/null
			sleep 5
			retry=$(($retry - 1))
			sig=-9
		done

This fixes the problem with nfs.  All of the .nfs files are removed, and the
problems shutting down are minor:  the first attempt to umount fails, the
second succeeds, and there is still the lockd message about not being able
to unmonitor the server.  

However,  gconfd-1 doesn't clean up the lock directories on the server.
Furthermore, the error messages are still placed in /var/log/messages on
the server.


Is there a better solution?  Is the problem with the way I have
NFS configured?  How can I get the shutdown to proceed smoothly and without
delay while gconfd-1 is running?  Any other thoughts, suggestions?

Regards,

MJ
-- 

Marty J. Wolf                   mjwolf acm org
Math & CS Department            mjwolf whitetail bemidjistate edu
Bemidji State University        Office: (218) 755-2825
1500 Birchmont Drive, Box 23    Fax: (218) 755-2822
Bemidji MN  56601               http://whitetail.bemidjistate.edu/mjwolf




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]