Demystifying cycle detection (with Moose)

Cyclic dependencies is one of those concerns that occupy a top position in developer's consciousness. Usually, it comes right after the size of the code, and next to dead code.

Cycles are bad, goes the urban legend. But, why are they bad? And what cycles are we talking about? Are all cycles bad? Are cycles within packages bad? Are cycles within projects bad? Are cycles between classes bad? Are they all bad in the same way?

Generalization usually leads to easy solutions, but at the same time, every generalization comes with a price. However, in any serious engineering discipline, details are important, and we should be able to deal with them.

Let's consider some Java-based examples and see how details can play an important role. The examples are accompanied by detection approaches based on Moose.

Detecting all cycles between packages

The simplest form of cycle detection is to simply retrieve all cyclic dependencies between all namespaces (in our meta-model, Java packages are called namespaces) in the system. Moose comes with a built-in cycle detection algorithm that can be used like:

mooseModel allNamespaces cyclesToAllProviderNamespaces.

This expression is applied on a model of a Java system, and it returns groups of namespaces that form cycles within the given system. These groups can later be manipulated in various ways. For example, when applied on a Java system I dealt with recently, I obtained some 33 cycles. One of these cycles involved 38 namespaces. The pictures below show the namespaces involved in the cycle and their interdependencies (as a Dependency Structure Matrix, and as a graph).

Simple. But, is it efficient? Do you truly want to deal with all dependencies throughout your entire system at once? Often this is not desired, especially when you start to look into cycles late in the project.

Cycles inside components only

When working on a large project you might want to restrict the detection to only a part of it based on some criteria. A common approach is to identify sub-projects or components and look at cycles only inside of these.

(mooseModel allNamespaces select: #isPartOfMyComponent)
  cyclesToAllProviderNamespaces

In this expression, we first select the namespaces from the desired component via the isPartOfMyComponent custom method, and then simply call the detection of cycles on the result. What is isPartOfMyComponent? It is an extension that you have defined. For example, if you can identify a component based on a naming convention, a possible implementation would look like:

FAMIXNamespace>>isPartOfMyComponent
  ^ self mooseName beginsWith: 'com::example::mycomponent'

Cycles between components

Looking inside a component reveals detailed cycles. A similarly legitimate idea would be to look for cycles between components to identify high-level architectural violations.

This is a slightly more complicated issue. For this, we need two things:

  1. A way to identify all components, and
  2. A way to identify all their inter-dependencies while taking into account that a component can be formed by many sub parts.

Let's start with identifying the components, and let us suppose that you have a way to identify the components based on names. In this case, we can associate the root namespace with the component and make all sub-namespaces be able to say what is its parent component.

FAMIXNamespace>>component
  self mooseName = 'com::example::mycomponent' ifTrue: [ ^ self ].
  self mooseName = 'com::example::yourcomponent' ifTrue: [ ^ self ].
  "..."
  ^ self belongsTo isNil
      fTrue: [ self ]
      fFalse: [ self belongsTo component ]
FAMIXNamespace>>isComponent
  ^ self component = self

By extending the default meta-model with the notion of your components, you can ask the namespace com::example::mycomponent::implementation for its component and it will return the namespace called com::example::mycomponent.

Once this point achieved, we can tackle the second point:

FAMIXNamespace>>allProviderComponents
  ^ (self allChildScopes flatCollect: #providerNamespaces )
      collectAsSet: #component

The trick here is to collect all dependencies (this is what we essentially get when invoking providerNamespaces) from all the nested namespaces, and then to collect the result as components. Armed with these helper extensions, we can tackle the original problem of identifying cycles between components:

(mooseModel allNamespaces select: #isComponent)
  cyclesToAll: #allProviderComponents

Through this expression, we essentially aggregate the cycles at component level by: (1) applying the detection only for components, and (2) considering dependencies only to components.

Cycles induced by a subset of code entities

But, what happens if your components include tests that introduce strange dependencies that you actually do not want to be bother with at the beginning? Well, you should be able to ignore those dependencies.

FAMIXNamespace>>allOutgoingAssociationsWithoutTests
  ^ self queryAllOutgoingAssociations select: [ :assoc |
      assoc from classScope isJUnit4TestCase not and: [
        assoc to asOrderedCollection anyOne classScope isJUnit4TestCase not ]]
FAMIXNamespace>>providerNamespacesWithoutTests
  ^ self allOutgoingAssociationsWithoutTests
      atNamespaceScope withoutSelfLoops asSet

With these extensions, we define a new way of obtaining provider namespaces by ignoring all associations that involve a test class. Afterwards, we can simply redefine the way we compute allProviderComponents by using the new dependency semantics:

FAMIXNamespace>>allProviderComponents
  ^ (self allChildScopes flatCollect: #providerNamespacesWithoutTests)
      collectAsSet: #component

Cycles between Stateless beans that cause injection issues

While the previous examples showed more classic use cases, sometimes, cycle detection can be used as a tool to identify specific problems. For example, in JBoss 5, there exist a bug in the implementation of the memory pools used for the allocation of beans. However, the bug is only apparent in special circumstances when (1) there exist cyclic dependencies between beans, and (2) the pool is smaller than the maximum amount of beans wanted at a time.

Long story short, if you have cyclic dependencies between beans you are susceptible to experience memory leaks. To guard against the bug, we need to set the pools to have a larger size than the amount of beans used at a time. However, to find out what this right size is, we need to form an idea of the runtime scenarios, and the first step in this direction is to know what cycles between beans are there in the system.

Such a cycle could look like this:

public class BeanA implements IBeanA {
  @EJB
  private IBeanB beanB;
  ...}

public class BeanB implements IBeanB {
  @EJB
  private IBeanA beanA;
  ...}

This is a kind of a cycle, and it should be be detectable like any other cycle. The only problem is that there is no direct cycle in the code because both references point to the interfaces, and not to the implementation. The cycle only happens at runtime. Thus, to detect our problem we have to manufacture the dependency:

(mooseModel allModelClasses select: #isBean)
          cyclesToAll: [ :class |
            class attributes flatCollectAsSet: [ :attribute |
              attribute declaredType
                ifNil: [#()]
                ifNotNil: [:type | type withSubclassHierarchy ] ] ]

The expression gets all the beans from the system, and for each of these will look at the possible cycles induced by the types of the attributes to all the sub types of the declared type.

Summary

The topic of cyclic dependencies occupies a top position in developer's consciousness for good reasons, but it is often surrounded by a mystic aura. The above examples show five different ways in which cycles can be approached within the context of a software system. Of these, the generic detection is the least useful one. Yet, this is what most tools offer. To extract real value, you need to customize the detection for the context of your problem. In order to do that, you need a platform that allows you to customize cheaply. Moose shows how this customization can be both cheap and powerful, in particular when combined with other querying mechanisms.

The bottom line is that cycle detection is a tool that should make it in the toolkit of any software engineer.

Posted by Tudor Girba at 1 April 2013, 12:31 pm with tags story, moose, assessment, analysis link
|