Re: gnome-boxes lockup issue
- From: "Daniel P. Berrange" <berrange redhat com>
- To: Alexander Larsson <alexl redhat com>
- Cc: gnome-boxes-list gnome org
- Subject: Re: gnome-boxes lockup issue
- Date: Wed, 9 May 2012 12:50:18 +0100
On Wed, May 09, 2012 at 10:47:22AM +0100, Daniel P. Berrange wrote:
> On Wed, May 09, 2012 at 11:34:14AM +0200, Alexander Larsson wrote:
> > I've been having some hangs in gnome-boxes, they got a lot better with
> > the latest patches to avoid blocking i/o on the main thread, but
> > apparently Jonathan still got them sometimes, so I backed out the latest
> > fixes and set out to track it down. Here is what happens when it hangs:
> >
> > * gnome-boxes does a blocking libvirt call on the main thread, for
> > instance virDomainGetXMLDesc()
> > * The libvirt worker thread for the call does a qemu monitor call to get
> > some info.
> > For instance the qemu driver for virDomainGetXMLDesc() calls
> > qemuMonitorGetBalloonInfo() which formats a json command, sends it and
> > waits for a reply.
> > * In parallel to the above, the guest did some kind of GUI call which
> > got into the qxl driver by doing i/o on a hw port tied to qxl. This
> > exits the cpu emulation and calls into qxl_spice_update_area() ->
> > red_dispatcher_update_area, which sends a message on a pipe
> > telling the qxl thread to send updates for the area. Then it waits for
> > a reply.
> > * The qxl thread gets the message, but before updating the area it
> > flushes outstanding messages by calling flush_display_commands(),
> > where it keeps trying to flush the pipe going to the client to make
> > the data in the pipe < MAX_PIPE_SIZE.
> > * However, the client is blocking in the main thread, so it will never
> > read from the spice channel, so we have here a 4-thread circular
> > deadlock, which will not be solved until eventually there is a timeout
> > somewhere. In the example above that is the QXL thread waiting
> > DISPLAY_CLIENT_TIMEOUT*10, i.e. 150 seconds, but maybe there are other
> > timeouts in different deadlock paths.
> >
> > So, since the spice client in boxes recieves data on the main thread we
> > can absolutely never do blocking i/o calls on the main thread that can
> > reach the qemu instance, as that will reproduce this deadlock.
>
> Urgh, ultimately I think this is a serious SPICE server flaw. The
> spice thread in QEMU must not block itself waiting for the SPICE
> client todo something. If it really must block itself, then it must
> absolutely never block the rest of the QEMU process by holding locks.
>
> As it stands it looks like a evil spice client can DOS the entire
> operation of the guest, or an evil guest QXL driver can lock up QEMU
> or SPICE client or both.
For the sake of archiving, on IRC we decided there are multiple flaws
here:
- F16 has an old QXL driver which does synchronous updates. This
will be fixed by updating F16 to the F17 QXL driver which is
fully async
- The SPICE server / QEMU ought to forbid use of the legacy
synchronous APIs with QXL
- QEMU ought to issue a notification when balloon memory changes,
so libvirt can then avoid needing to call the monitor in this
scenario
- libvirt ought to timeout gracefully when querying the balloon
memory level
Fixing any one of these issues would solve the hang, but we should
aim to fix all 4.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]