Chasing rogue announcements

A while ago, we got a major problem with the Glamour browsing engine of Moose (you can learn more about Glamour by reading The Moose Book). The problem was related to how Announcements were handled by the Glamour engine. We spent some dozen man-days on hunting the issue down, but somehow it did not work. We had to resort to looking at the problem the humane assessment way: by building custom tools. Using these tools helped us figure out the solution in a matter of hours.

First, a bit of background. Glamour is an engine for building browsers in a platform independent manner. Once a browser is constructed, the platform specific renderer produces the actual user interface. To handle the interaction between the objects from the browser model and the actual user interface widgets, Glamour makes use of Announcements.

Announcements are Smalltalk objects that can be announced by any object and received by observing objects. In a way, they are similar to Exceptions, but their goal is to ensure an object-oriented communication means for implementing the Observer pattern.

Getting the model objects to communicate properly with the renderer objects is critical. Given that this communication is done through Announcements and not through direct calls, it is difficult to get the proper overview by using only the code browser. To make matter worse, Glamour relies on a prototype-based design: it deep copies the objects on the models every time there is a significant interaction. Thus, a large part of the behavior is only to be understood at the objects level, rather than at the class level.

This being the situation, as the model became more complex, we suddenly noticed that in certain situations the renderer was not displaying the correct values. After days of investigations, we got to capture the situation in a simple scenario as can be seen in the script below. The details are not important. What matters is that even if the code of the example was simple, the problem was not easy to find.

tabulator := GLMTabulator new.
tabulator row: #content; row: #details.
tabulator transmit
 to: #content; 
 andShow: [ :a | a list display: [ :x | 1 to: x ] ].
tabulator transmit 
 from: #content;
 to: #details;
 andShow: [ :a | a text ].
tabulator transmit from: #content; toOutsidePort: #selection.
finder := GLMFinder new.
finder show: [ :a | a custom: tabulator ].
finder openOn: 42.

Of course, we wrote multiple tests to try to capture the problem. The funny thing is that the model was working just fine. Zero problems on that front. However, finally we managed to write a test that captured how the rendering of the above browser is not the desired one. Still, even if we had a failing test, we could not figure out what the problem actually was.

We had the hunch that the issue is related to the way the announcements were raised. More specifically, because Glamour relies heavily on copying objects internally, We thought that somehow we do not copy the announcements properly.

We tried to investigate it by looking at the code but it did not help. We tried looking at the issue by stepping through the code with the debugger, but it did not work either because of the maze of copied objects.

So, we turned to a different solution: a visualization. We started by visualizing the objects in the browser model. In particular, we wanted to see how the panes and presentations are linked in a tree structure. We wanted to depict each presentation with a simple label, and a pane with a rectangle showing the name and its ports inside. Furthermore, just to make sure, we also wanted to show with blue the connection between ports.

Using the Mondrian visualization engine from Moose, the initial visualization took some 30 minutes to craft.

Everything looked fine. The objects were connected in a tree as expected, which means that they got copied properly. The next step was to add all the announced objects and show the connections in the same visualization. This took some more 15 minutes.

The red lines denote the connections between the objects from the model (on the left) to the renderers (on the right). This revealed the problem quite nicely: there were several objects from the model that were linked to the same renderer. For example, the second MorphicPane from the top right. This meant that the problem was definitely related to the way the announcements were copied.

This reduced the scope of search to a set of about 5-10 methods. After more code inspection the solution boiled down to properly copying the announcer registry. In the end, it took a one line fix.

GLMAnnouncer>>postCopy
 super postCopy.
 registry := registry copy.

After the fix, the test got green. And the visualization looked as expected.

What is the take away lesson?

Not all problems can be captured in a useful manner from a functional point of view. Even if we had a failing test, we still could not get to the root of the problem. This was not enough to get us to improve the situation. And the debugger was not quite useful either. It required a spike assessment effort to craft dedicated tools to reveal the problem.

And what is more, we now have a visualization that can be used for further use cases, including as a teaching tool for people that want to learn how to work with Glamour. Here is an example of how it looks for a complex browser.