Understanding where pygobject toggle ref-counting complexity comes from



Hi Everyone, 

I've been using pygobject for some time now but only recently started digging into the pygobject 2.28.6 code for the first time (gobject/pygobject.c) in the process of tracking down a warning I've been getting in my application. 

I found the toggle reference code a bit of a headache, and tried to come up with a clear way of understanding what was going on, which I thought I'd share here so as to expose it to some criticism. That's led me to thinking of a possibly cleaner way to solve the problem that toggle references are trying to address which I'll outline below. Comments welcome. 

* Why does pygobject add the complexity of toggle references? 

The basic issue arises once user pygobject state is spread across two distinct objects, the pygobject wrapper and the C-level gobject. Let's call these 'counterparts'. Together, they implement a virtual pygobject and must both remain alive for its lifetime.

If we were doing this purely in python there'd be no problem, just create a cyclic reference between the two counterparts and let the cyclic garbage collector work out when no other references remain to either of them. 

However, in our case the two counterparts are subject to two distinct garbage collectors, and so this approach requires us to determine when to break the cycle ourselves, which is when there are no external references to the virtual pygobject as a whole. 

In effect, we need two distinct reference counts: the number of external references and the implementation-internal references for the cycle, but are forced to combine them into the single existing ref count, which in turn forces us to know when the --ref_cnt == 1 for each counterpart as this is when they become unused, not when it reaches 0 as normal. Hence the toggle_ref and lots of hoop jumping. 

More formally, for all counterparts x: 

used.x == external_ref_cnt.x > 0 
alive.x == external_ref_cnt.x + internal_ref_cnt > 0 

(following EWD1300, where '.' is function application and '==' is logical equivalence) 

(Note that this implies alive.x -/-> used.x , whereas usually we can count on alive.x == used.x and needn't make the distinction.)

, and we must maintain used.x -> alive.counterpart.x for all stateful counterparts of x. 

(Note that when the gobject is the only stateful counterpart, this is easy to maintain as the gobject needn't sustain the pygobject's state with a reference, and so used.x == ref_cnt.x > 0 for the pygobject, as per normal, and we needn't determine 'used' for the gobject as its counterpart is not stateful.)


* Ok, can get what we want without the cyclic reference? 

As the entire user-defined pygobject state is contained in the inst_dict, and this is what we want to preserve over the lifetime of the virtual object, rather than the python wrapper per se, why not avoid the cyclic reference by making the inst_dict owned by the gobject? This would avoid all the toggle_ref stuff and greatly simplify the code.

However, it would be an usual python object, as its dictionary could live beyond its lifetime when the only references to the virtual pygobject were on the gobject side. The inst_dict would at times be without an instance. Although I don't think it is possible to access the instance via an inst_dict, and so pose no problem there (I am not sure), users may rely on it to being  deallocated at the same time that its instance is (unless they have referenced it themselves for their own purposes).  


warm regards, 
Richard. 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]