overwatering.org

blog

about

You have an application. It works with complex data of varying structure. There are many optional parts to each piece of data. The domain lends itself to processing services working with anaemic, rather than rich, models. The first release needed to be built quickly.

Given all that, the sensible decision was made to use a generic, flexible data structure for all models. This is a common pattern in languages like Ruby or Perl, where the in-built hashes, or maps, would be used. It’s also a very common approach in Clojure. Hashes are by far the most common data structure to be used in this way, but I have also seen XML documents used.

Within the application, there is a component that depends upon three services: SvcA, SvcB and SvcC. The component happens to call these services in that order. Following the standard design, a map of data is passed to each service. Each service does some processing based on the data in the map, and then returns a result.

At some point in the future, something horrible happens.

A developer needs to change the processing performed in SvcC. It now requires an additional piece of data. Casting around the code base, they see that SvcA has access to that piece of data. Either SvcA gets it from some external source, or computes it as an intermediate value.

Chasing down this promising lead, the developer can see that SvcA is quite close to SvcC: both services are called by the same component. In fact, that component happens to pass the same map of data to both services. With a small change to SvcA, the piece of data is added to the map, under a new key. SvcC can then pick up the piece of data. This feels like a safe, non-breaking change. It’s just a map, and by storing under a new key no other code could possibly care.

The hitchhiking anti-pattern has entered your application.

It should be pretty obvious how bad this is. All four modules have now been tightly coupled together, and coupled together in a way that is almost completely invisible. Interfaces don’t publish the coupling, static analysis can’t see it. SvcB is coupled by having future changes to its interface potentially restricted. The component that ties everything together must continue to pass the same map to every service. SvcC can’t be used independently of SvcA — but you can’t see any of this.

This is a pretty pathological case. But, once an application is passing around open maps it’s surprisingly easy for this sort of thing to happen. Often it’s simply because it isn’t clear that a map doesn’t originate or become owned by a piece of code. Ask any C++ programmer about the joys of tracking ownership.

Once this anti-pattern creeps into an application, a whole class of bugs becomes very common. The application is still easy to unit test, but it is steadily harder and harder to change.

While this is a Bad Thing to happen in your code, I don’t believe it’s an argument against using maps instead of model classes. It is important, however, to be careful. The underlying problem in the above pathological case was actually that some shared logic needed to be factored out of SvcA, but it was easier for data to hitchhike, than for that refactoring to happen. Maps should make it easier to move code around, so keep an eye on code mobility.

I want to finish with a note about Clojure. Idiomatic Clojure programming calls for the extensive use of maps. In fact, records even behave like maps. However, thanks to persistent data structures this whole class of bugs can be avoided. The same applies to other languages or platforms with immutable or persistent data structures, such as Haskell or Objective-C.