gnome-boxes lockup issue



I've been having some hangs in gnome-boxes, they got a lot better with
the latest patches to avoid blocking i/o on the main thread, but
apparently Jonathan still got them sometimes, so I backed out the latest
fixes and set out to track it down. Here is what happens when it hangs:

* gnome-boxes does a blocking libvirt call on the main thread, for
  instance virDomainGetXMLDesc()
* The libvirt worker thread for the call does a qemu monitor call to get
  some info. 
  For instance the qemu driver for virDomainGetXMLDesc() calls
  qemuMonitorGetBalloonInfo() which formats a json command, sends it and
  waits for a reply.
* In parallel to the above, the guest did some kind of GUI call which
  got into the qxl driver by doing i/o on a hw port tied to qxl. This
  exits the cpu emulation and calls into qxl_spice_update_area() ->
  red_dispatcher_update_area, which sends a message on a pipe
  telling the qxl thread to send updates for the area. Then it waits for
  a reply.
* The qxl thread gets the message, but before updating the area it
  flushes outstanding messages by calling flush_display_commands(),
  where it keeps trying to flush the pipe going to the client to make
  the data in the pipe < MAX_PIPE_SIZE.
* However, the client is blocking in the main thread, so it will never
  read from the spice channel, so we have here a 4-thread circular
  deadlock, which will not be solved until eventually there is a timeout
  somewhere. In the example above that is the QXL thread waiting
  DISPLAY_CLIENT_TIMEOUT*10, i.e. 150 seconds, but maybe there are other
  timeouts in different deadlock paths.

So, since the spice client in boxes recieves data on the main thread we
can absolutely never do blocking i/o calls on the main thread that can
reach the qemu instance, as that will reproduce this deadlock.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]