Re: gconfd process eating CPU




Mark:

On Fri, 2004-12-17 at 17:31 -0600, Brian Cameron wrote:
Such sick gconfd-2 processes have the following stack trace:

  core 'core.64457' of 64457:     /usr/lib/gconfd-2 13
   ff19d9c8 _poll    (39660, d, 7055, 56240, 0, 3) + 8
   ff22a28c g_main_context_poll (35e20, 7055, 7fffffff, 39660, d, ff280410) + d4
   ff2294cc g_main_context_iterate (35e20, 1, 1, 0, 35e28, 39660) + 37c
   ff229e00 g_main_loop_run (3dfd8, 1, 0, ff282c54, 2c3f0, 400) + 300
   000167c4 gconf_main (3dfd8, 17400, 2a000, 2ad58, 0, 2ac2c) + 90
   00016508 main     (0, fbc9, 2abd4, 0, 18fe8, 33b88) + 4d8
   000130d0 _start   (0, 0, 0, 0, 0, 0) + 108

Unfortunately, that doesn't really give me much of a hint of what is going
wrong with gconfd-2.

	One possibility is that one of the file descriptors we are polling has
a pending condition which we aren't processing or that the file
descriptor was closed but never removed from the poll. An strace of
gconfd-2 would help you figure out if that is the case.

Unfortunately the problem does not seem to happen in a predicatable fashion
so we don't know how to recreate the problem.  We do have monitoring software
that points us towards processes once they've entered an unhealthy state
and are sucking up 100% of the CPU.  So I can easily find many processes
that are sick.

Aside from running strace or truss for every single user on the system and
waiting for one to hang, is there an easier way?  Is there any way to tell
what file descriptor is causing the problem once the process has hung?
I ran pfiles on a sick gconfd process and got the following information.
Not sure if this is helpful.

64457:  /usr/lib/gconfd-2 13
  Current rlimit: 1024 file descriptors
   0: S_IFCHR mode:0666 dev:85,0 ino:194649 uid:0 gid:3 rdev:13,2
      O_RDWR
   1: S_IFCHR mode:0666 dev:85,0 ino:194649 uid:0 gid:3 rdev:13,2
      O_RDWR
   2: S_IFCHR mode:0666 dev:85,0 ino:194649 uid:0 gid:3 rdev:13,2
      O_RDWR
   3: S_IFCHR mode:0666 dev:85,0 ino:194649 uid:0 gid:3 rdev:13,2
      O_RDWR
   4: S_IFDOOR mode:0444 dev:302,0 ino:54 uid:0 gid:0 size:0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[399]
   5: S_IFCHR mode:0666 dev:85,0 ino:194645 uid:0 gid:3 rdev:21,0
      O_WRONLY FD_CLOEXEC
   6: S_IFSOCK mode:0666 dev:296,0 ino:31091 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC
        sockname: AF_UNIX /tmp/orbit-nw141292/linc-fbc9-0-41988743b6051
   7: S_IFREG mode:0700 dev:304,5508 ino:1390804 uid:142292 gid:10 size:626
      O_WRONLY|O_CREAT FD_CLOEXEC
   8: S_IFREG mode:0700 dev:304,5508 ino:1390818 uid:142292 gid:10 size:626
      O_WRONLY|O_CREAT FD_CLOEXEC
   9: S_IFSOCK mode:0666 dev:296,0 ino:48514 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_UNIX /tmp/orbit-nw141292/linc-fbc9-0-41988743b6051
        peername: AF_UNIX
  10: S_IFSOCK mode:0666 dev:296,0 ino:39609 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57745
        peername: AF_INET 129.153.128.104  port: 32892
  11: S_IFCHR mode:0000 dev:85,0 ino:38009 uid:0 gid:0 rdev:41,568
      O_RDWR FD_CLOEXEC
  12: S_IFSOCK mode:0666 dev:296,0 ino:7157 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_UNIX
        peername: AF_UNIX /tmp/orbit-nw141292/linc-7d86-0-41b98a4a4ef64
  13: S_IFSOCK mode:0666 dev:296,0 ino:46053 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57746
        peername: AF_INET 129.153.128.104  port: 32820
  14: S_IFSOCK mode:0666 dev:296,0 ino:2017 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57747
        peername: AF_INET 129.153.128.104  port: 32880
  15: S_IFSOCK mode:0666 dev:296,0 ino:476 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57748
        peername: AF_INET 129.153.128.104  port: 32893
  16: S_IFSOCK mode:0666 dev:296,0 ino:39465 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57749
        peername: AF_INET 129.153.128.104  port: 32898
  17: S_IFSOCK mode:0666 dev:296,0 ino:55057 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57750
        peername: AF_INET 129.153.128.104  port: 32891
  18: S_IFSOCK mode:0666 dev:296,0 ino:164 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57751
        peername: AF_INET 129.153.128.104  port: 32902
  19: S_IFSOCK mode:0666 dev:296,0 ino:30654 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57752
        peername: AF_INET 129.153.128.104  port: 32906
  20: S_IFSOCK mode:0666 dev:296,0 ino:39438 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57753
        peername: AF_INET 129.153.128.104  port: 32894
  21: S_IFSOCK mode:0666 dev:296,0 ino:19476 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        sockname: AF_INET 129.153.131.96  port: 57754
        peername: AF_INET 129.153.128.104  port: 33314

--

Brian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]