Hunting leftover weak announcements with GTInspector

Yesterday, I paired with Andrei and joined the hunt for leftover weak announcements that tend to accumulate in Pharo images since recently. This is a problem because it leads to an overall slowdown of the system. There were already several valuable points made in the long thread and there was even a fix readily available provided by Max Leske.

The main issue seemed to be that the weak announcements were not garbage collected because of a VM problem even if their subscribers were nil. However, there also seemed to be indications that the code related to the Glamorous Toolkit might have been responsible for this problem, so we wanted to understand the issue to see if there was anything we could do to help, especially given that the very problems in the inspector seemed to turn the inspector not so useful for identifying the problem. This turned out to be an exciting 30 minutes assessment session. Let me get you through it.

In a Pharo 50662, we started by inspecting:

SystemAnnouncer uniqueInstance

The inspector readily showed us the 2602 subscriptions. From the start we saw that there were many for which the subscribers are nil, a point also raised by Peter:

Weak-1.png

Another thing we noticed is that there seemed to be a repetition of announcements that appeared with nil subscribers. To check this, we executed:

grouped := (SystemAnnouncer uniqueInstance
    subscriptions glmSubscriptions select: [ :each |
    each subscriber isNil ])
        groupedBy: [:each | each announcementClass name].

Indeed, there were only 4 announcements that were affected by the problem:

Weak-3.png

More interestingly, we noticed that we had an equal amount of subscriptions for each announcement type (i.e, 640). This suggested that these registrations came from the same place. To browse this possibility, we looked for all methods that reference all these classes:

grouped keys allButFirst
    inject: (SystemNavigation new
             allReferencesTo: grouped keys first asClass binding)
    into: [ :result :each |
        result intersection: (SystemNavigation new
                              allReferencesTo: each asClass binding) ].

We got 12 such methods:

Weak-5.png

We browsed a bit, and we saw that some of these methods registered other system announcements as well, such as ClassCommented, but those did not appear in our problem list:

Weak-6.png

Thus, this route did not seem to be the best way to understand the source of the problem. We turned a bit around and took another route. As, each subscription provides us the selector that is to be sent when the announcement gets triggered, we queried the system to give us those selectors:

groupedSelectors := grouped associations collect: [ :assoc |
    assoc value collect: [:each | each action selector ] as: Set].

In this case, too, we got for each distinct announcement exactly one selector.

Weak-7.png

So, we looked at all methods that send all of these selectors:

selectors := groupedSelectors flatCollect: #value.
selectors allButFirst
    inject: (selectors first senders)
    into: [ :result :each |
        result intersection: each senders ].

And we got exactly one method: PragmaCollector>>installSystemNotifications.

Weak-8.png

Essentially, this means that the responsible for creating the global subscriptions that should have been garbage collected was the PragmaCollector. Certainly, the problem was not related to the PragmaCollector, but we now knew that no code in the Glamorous Toolkit prevented these subscriptions to become garbage collected.

We manually applied the patch to finalize and eliminate these subscriptions:

(SystemAnnouncer uniqueInstance subscriptions glmSubscriptions copy select: [ :sub |
    sub class = WeakAnnouncementSubscription and: [
        sub subscriber isNil ]])
            do: #finalize.
Smalltalk garbageCollect.

We were not yet quite done because executing:

WeakAnnouncementSubscription allInstances select: [:sub | sub subscriber isNil]

told us that we have more than 2000 instances left to deal with. The first question we asked was where did all these come from. There needed to be some global variables that were keeping those around but we did not know which one given that SystemAnnouncer uniqueInstance was cleaned up. We applied the same type of analysis.

announcements := (WeakAnnouncementSubscription allInstances select: [ :each |
     each subscriber isNil ])
          asOrderedCollection groupedBy: [:each | each announcementClass ].

Again, we found that all of the announcements are global announcers about the state of the system, and that each of them appeared in equal quantities.

Weak-10.png

Like in the previous case, we again looked at the place where all these announcements are sent from:

announcements keys allButFirst
    inject: (SystemNavigation new
            allReferencesTo: announcements keys first binding)
   into: [ :result :each |
      result intersection: (SystemNavigation new
                            allReferencesTo: each binding) ].

We found 3 cases. The first two were using SystemAnnouncer uniqueInstance weak, so these could have not been the root. In the third one, we discovered a new announcer: SystemAnnouncer uniqueInstance private.

Weak-11.png

In the end, this reported problem was not related to GT. There were other problems related to the management of announcements, but those were other cases.

This session showed again how the quick connection between the runtime and code can dramatically speedup the amount and the depth of the questions we can ask our system. And this is critical for modern software engineering.

Posted by Tudor Girba at 29 March 2016, 7:50 am with tags story, assessment, gt link
|