On analyzing the density of nil checks

How many nil checks are there in your system?

A recent paper entitled Does return null matter? published at CSMR-WCRE 2014 goes after this issue. The paper presents a systematic analysis of many systems over multiple versions and among others, it concludes that most systems oscillate between 20-40 nil checks for every 1000 lines of code, and that systems that go beyond the 40 limit seem to get in trouble.

I liked the paper. Null proliferation is a plague that deserves a closer examination. The paper conclusion looked quite intriguing, too. While not necessarily providing a direct path to action, answering this question for your system can still be educating. At the very least, it can make for a nice analysis exercise.

In the following I describe my experience of measuring the density of null checks for two systems and the lessons I learnt from it.

Checking a Java system

I first wanted to check a Java system I know. To detect the null checks, I could have relied on a deep abstract syntax tree traversal, but in my case, a simple text analysis was sufficient: I simply checked all occurrences of null checks using regular expressions on a FAMIX model of the system.

relevantClasses := model allModelClasses reject: [ :each | 
     each isAnonymousClass and: [ each isInnerClass ].
nullCheckTextPatterns := #('== null' '!= null' 'null ==' 'null !=‘).
numberOfNullChecks := relevantClasses sum: [ :each |
     (each sourceText allRegexMatches: ( '|' join: nullCheckTextPatterns)) size].

Note how the input to the measurement is a subset of all classes defined in the system. Because I was doing a plain text analysis, I needed to ensure that every line of code was traversed only once. Thus, I had to ignore both the anonymous classes (which are defined inside methods) and inner classes (which are defined inside classes).

Once the number of null checks available, I simply divided the amount of null checks to the total number of lines of code.

numberOfNullChecks / (relevantClasses sum: #numberOfLinesOfCode)

The result was 0.016.

Using average can be misleading when judging a distribution. Here is a map (drawn with the new CodeCity implementation from Richard Wettel) that offers a visual perspective: boxes represent classes grouped in meaningful packages; the height and the redness of a box grow with the amount of null checks; the base of the box grows with the amount of methods. The visualization reveals that there are clear hot spots throughout the system.

Java-codecity.png

Checking the Moose code

For comparison purposes, I wanted to find the answer for the Moose code as well.

Given that Moose is built in Pharo, and that in Pharo we wrap actual nil checks into more meaningful messages, I could not apply the same text-based analysis. Instead I needed a deeper analysis. But, before that, I first had to find what are the methods that actually do imply a nil check.

We know that typical nil checking methods have nil in their name. For example, #ifNil: is such a method. But, what are all the other methods?

A simple analysis provides the answer. I created a FAMIX model for the Moose code, and looked for all methods that have nil in their name:

model := MooseModel root allModels first.
(model allInvocations select: [:each | '*nil*' match: each signature ]) 
     collectAsSet: #signature

This resulted in a set of 25 methods:

a Set(#'morph:withTitleOrNilOf:(Object,Object)' #'ifNil:ifNotNil:(Object,Object)' #'allowNil()' #'ifNotNil:ifNil:(Object,Object)' #'ifNotNilDo:ifNil:(Object,Object)' #'propertyNamed:ifNil:(Object,Object)' #'useImplicitNotNil()' #'ifNotNilDo:(Object)' #'characterStyleOrNilAt:(Object)' #'ifNotNil:(Object)' #'concreteClassOrNil()' #'isEmptyOrNil()' #'useExplicitNotNil()' #'characterStyleOrNilIfApplying:(Object)' #'renderWithTitleOrNil:(Object)' #'usesImplicitAllNil()' #'findClassNamedOrNilFromFullName:(Object)' #'notNil()' #'ifNil:(Object)' #'useExplicitAllNil()' #'nilLiteral()' #'usesImplicitNotNil()' #'isNil()' #'selectedItemOrItemsOrNil()' #'allowAllNil()')

Of these, not all represent actual nil checks. A messages like #isNil clearly represents a nil check, but some others have other purposes. For example, #allowNil is used in Glamour as a setting. Thus, for the actual analysis, I had to disregard those messages manually.

Armed with the actual target message list, I could proceed to find all their usages within the code. On the same model I could do:

numberOfNullChecks:= model allInvocations select: [ :each |
     #(#'ifNil:ifNotNil:(Object,Object)' #'ifNotNil:ifNil:(Object,Object)' #'ifNotNilDo:ifNil:(Object,Object)' #'propertyNamed:ifNil:(Object,Object)' #'ifNotNilDo:(Object)'  #'ifNotNil:(Object)' #'isEmptyOrNil()' #'notNil()' #'ifNil:(Object)' #'isNil()' ) includes: each signature
     ].

Finding the density is now a simple arithmetic problem:

numberOfNullCheckssize / (model allMethods sum: #numberOfLinesOfCode) asFloat. 

The result? 0.014.

Comparing the two systems

Both systems had about the same ratio. This was highly surprising because when working with the Java system, I encountered more problems related to handling nulls than when working with Moose. I was intrigued.

After sleeping on it, I realized two things. First, the expressiveness of the Pharo language and libraries makes programs written in it more concise. For example, a simple collection selection in Pharo is a one liner, while in Java it requires at least three lines: new collection instantiation, a for loop, and adding to the collection. This issue would then imply that the same amount of nil checks would appear more dense in a Pharo code than in a Java one. Nevertheless, this issue is hard to capture automatically. One possibility would have been to relate the density to the number of statements rather than the number of lines of code, but this would have meant a different measure than the one proposed in the paper.

Second, there simply are more empty or irrelevant lines in Java than there are in a Pharo system. Let’s consider an example of a Java class:

public class A
{
     private X x;

     public X getX()
     {
          if (x == null)
          {
               x = this.computeX();   
          }
          return x;
     }

     public void setX(X x)
     {
          this.x = x;
     }
}

Computing the default number of lines of code, retrieves 19. However, there are several lines that are not that relevant for our computation:

  • there are several empty lines of code, and
  • there are several lines containing only a curly brace.

The corresponding code in Pharo would look like:

Object subclass: #A
     instanceVariableNames: 'x'
     classVariableNames: ' '
     poolDictionaries: ' '
     category: ' '
A>>x
     ^ x ifNil: [x := computeX]
A>>x: anX
     x := anX

In this case, all lines are relevant. Also, note how while x requires an full line in Java, in Pharo, this is part of the variables definition placed on one line. To make the comparison more reasonable, we would need to normalize the lines of code. To this end, I considered only:

  • the lines inside methods,
  • the lines that are not empty and that have more than a curly brace (Java) or square bracket (Pharo).

To compute this, I defined a new metric that only counted a line if it contained more than one non-space character:

relevantClasses := model allModelClasses reject: [ :each | each isAnonymousClass ].
numberOfRelevantLinesOfCode := relevantClasses sumNumbers: [ :each |
     each methodsGroup sumNumbers: [ :m |
          (m sourceText lines select: [ :line | 
               line trimBoth size > 1 ]) size ]]

Given that the granularity of the analysis changed from class-level to method-level, this time, I only ignored the anonymous classes, as they are part of the method. Inner classes are not excluded anymore because I needed to count their methods, too.

Interestingly, the metric worked out of the box for both Pharo and Java systems. (Also, please observe how the expressive Pharo string manipulation library makes the line checking almost trivial.)

After redoing the measurement I got:

  • Moose: 0.016
  • Java system: 0.028

In the case of Moose, the result is almost unchanged (0.016 instead of 0.014). However, for the Java system we get an almost double value (0.028 vs 0.016).

Lessons learnt

While analyses should be objective, subjective feelings still have an important place in the assessment process: they generate questions. In our case, I could have simply be content with the results and consider the two systems be similar, but my gut feeling told me otherwise. I questioned the underlying assumptions behind the original analysis, and got a significantly different result.

Of course, one danger with this approach is that you might end up constructing scenarios and gather facts just to confirm your feelings. This is one psychological issues of the pitfalls described in the Sources of Power book. It could be that this is what happened here. However, I am more confident in the later analysis than I am in the first one. The main reason behind my confidence is that I can easily describe the reasons for the improved analysis.

What’s more, I slept over the problem before interpreting it. That is always useful when you are unsure.

From a different point of view, a rather easy metric proved to produce significantly different results once I flipped a bit the interpretation. Tom deMarco famously said that you cannot control what you cannot measure. There certainly is truth to this, but I still feel the need to be more precise:

You cannot control if you do not know how you measure.
Tudor Girba

Posted by Tudor Girba at 14 March 2014, 7:11 am with tags spike, story, moose, pharo, assessment link
|