Re: Gnome locks frequently



On Wed, Dec 12, 2001 at 06:00:57PM -0600 or thereabouts, Ian T. wrote:
> Hi all.	
> 	I've been watching for fix for this as well.  I'd love to run 
> Gnome as it has features I really like.  However I can't run it 
> reliably because of these lockups.  

I snipped the log for someone more knowledgeable than me:
but I know that Gtk-WARNINGs are just that: warnings. 
Gtk-ERRORs are the ones to watch out for.

But in the meantime, since it's come up and I foresee lots of
"me too" posts: machine lock-ups in GNOME requiring the Big 
Red Button. 

The single most useful thing you can do is not available to everyone.
It involves (a) another computer and (b) a network connection between
them.

If you have those, there's quite a bit you can do. If not, suddenly
a journalling filesystem becomes of critical importance, because
hitting the power button is a grand way to toast data.

I discovered this the hard way recently, when I was having to
hit said button a lot :( 

Earlier in the thread, Havoc said "must be kernel issue". There's 
one other possibility, and it's X itself. X is a privileged program
and can actually poke around in the hardware of the machine. You
can crash it pretty well if you get that wrong. 

This is what I do when everything freezes (it doesn't happen 
that often, but this is all burned on my brain).

Find other terminal on other machine.
Ping frozen-box. It's up! 
ssh into frozen-box.
run 'top'
Look to see whether something is spinning in 90%+ of the CPU time.
Yes, there's X.

At this stage, you can just shoot X. Start top as root, and 'k'
to kill it. Don't kill X with -9 (the over-used "dead dead dead"
signal): it doesn't like it at all and won't restart happily
(your display won't come back). Try 15 first (which is the default
on 'top' on Linux at least).

If you're feeling inquisitive, you can narrow it down by shooting
off the X clients first instead of X: the X clients are the 
programs running under X, basically. The classic example was
often Netscape. Killing that off would sometimes get X unstuck.
So go for that first. Then just randomly kill anything that looks
like a Gnome or X program. (If the PID column has a single digit
number, it's probably not Gnome or X and should be left alone:
processes (programs, usually) get numbered from 1 to 30,000 
or so, and then start again, taking the numbers that are now 
free. So 1 is always 'init', then a series of useful things start.
They hog the lower numbers.) 

If killing the X clients doesn't do it, then kill X. At that
stage, my ideas on how to debug X run out (unless Nvidia is
involved: see below).

One note on killing X off. Occasionally it seems to
have triggered an evil X bug. I would return to the frozen
box expecting it to be working, and the display would still
be frozen, and I had to resort to rebooting anyway. Then
I'd get the weirdest display on the console: everything shifted
a few characters right with a psychedelic coruscating rainbow
bar down the left. I had to switch off properly and do a 
cold reboot with that. That was with one particular card,
but beware of that :) 

If you can still ping and ssh into the box, then clearly the
kernel is alive, and it's likely to be something in X that went
wrong.

If you can't reach the box at all, it's *probably* the kernel.
But not always :) I have a NeoMagic chip and I didn't read "man
neomagic". I was suffering these lockups for a while before I
was pointed at the man page ("Note:  On  some laptops using the 
2160 chipset (MagicGraph 128XD the following options are needed 
to avoid a lock-up of the graphic engine". Duh. I felt stupid.)

It's possible this is all Linux-specific, but I don't think
so. One thing that may be Linux-specific: the closed source
NVidia-produced 3D drivers are *known* to cause hangs, lock-ups
and so on. If you have such a driver, uninstall it and use just 
the XFree86-provided 2D driver and see whether that helps. That
again will narrow it down. Neither the kernel folks nor the XFree86
folks care about bug reports involving the binary drivers, 
because without the source they can't fix them. 

If you build your own Linux kernel and have only one machine
and can't do the ssh'ing in, the last hope: there is a patch 
floating around which will make the keyboard lights flash if 
the kernel crashes. It's of limited use in most circumstances :)
But in this case it will at least tell you whether you need to 
look at X or the kernel!

Wow, that was long. Sorry. I hope it was reasonably accurate.

Telsa



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]