Category: Software design

The benefits of OO aren’t obvious: Part IV of V, refactoring

Previously I wrote:

Sometimes you work with something for so long that you forget that other people aren’t familiar with it. This happened recently talking to a friend of mine — “I’ve got no need for object orientation”, he said, and I was shocked. […] What, I wondered to myself, were the benefits of OO that he was missing? How could I persuade him that it does have a point? So here is my attempt to trawl my memory and see how I’ve changed in response to OO, and why I think it serves me (and my projects) better. In the end I’ve found that the benefits are real, but nothing I would have guessed at when I first started in this field.

I’ve already written about data hiding, separation of concerns and better testability. Now read on…


This is something that builds on all the items above. If you’ve worked on any piece of code for a length of time you’ll always have come to the point where you think “Damn, I really should have done that bit differently.” Refactoring is what you want to do here: “restructuring an existing body of code, altering its internal structure without changing its external behavior”.

Refactoring is helped considerably by OO. “I already refactor,” says my non-OO friend, “I always make sure common code is in put into functions.” Which is good, but there’s a whole lot more to refactoring than that. Martin Fowler lists a vast number of options. But before I list my favourite refactorings (how sad is that? I have favourite refactorings) let me explain why this is so important, why it probably isn’t all entirely obvious, and how it relates to so much else here…

First, be aware that two people could both write outwardly identical applications but with their source totally different internally. At its most extreme, refactoring would enable you to evolve from one codebase to the other in lots of very small steps, at each step leaving you with the same working, stable, application. That’s much more than just putting common code into functions. If you think you already have the skill and technology to entirely change your codebase without changing anything outwardly then you’re a refactoring expert. If not, then you’ve got a lot to learn (and a lot of fun yet to have).

Second, refactoring is so important because it gives you a chance to improve your code. Unit testing (and indeed all testing) makes this possible. Test driven development isn’t just about test-first development, it’s also about refactoring. The last step of a TDD sequence is to refactor. You ensure your tests pass, you refactor, and you run your tests again, to make sure they still pass — to ensure you’ve not broken anything while refactoring. A suite of unit tests acts as scaffolding around your code, allowing you to rebuild parts of it without fear of the edifice collapsing. If you’ve made a mistake in your refactoring then the unit tests should show exactly whereabouts the problem is.

Third, taking the previous two together, you can make vast, sweeping changes to your codebase in tiny, safe steps. Simply break down the process into lots of small refactorings, and test after each one. This is how the apparently-impossible task of changing your entire codebase reliably is possible. In reality you’ll probably never want to change your entire codebase but you will want to change large chunks of it, and this is how you’d do it.

Fourth, you often find yourself needing to add a feature which seems reasonable to a client, but difficult to someone who understands the structure of the application. An excellent way to add the feature is to utilise refactoring as follows: refactor the software so that the addition of the feature does become a natural extension, and then add it. Martin Fowler gives an excellent worked example of this in the first couple of chapters of his book, which is (almost) worth the price of admission alone.

Now you can browse the refactoring catalogue linked to above (and here again), but here are some of my favourite refactorings…

  1. Rename variable. This is very trivial, and very easy, but all the better for it. Forgotten what a variable does? Give it a better name. It’s so easy to do, but so useful.
  2. Extract method. Never mind extracting common code into functions, I often find myself extracing one-off code into a single function (or “method”, as they’re called in OO) with a long and useful name. This is to make the original method much shorter, so I can see it all on one screen, and also so that it reads better. I no longer need to look at ten lines and translate them in my head into “…and then sort all the even numbered vertices into alphabetical order”. Instead I have a single method called sortEvenNumberedVerticesAlphabetically(). You think that’s silly? Maybe Java or VB.NET is your first language. Personally, English is my first language.
  3. Introduce null object [1, 2]. I don’t think ever used this, but I like it a lot because it’s just very interesting. The problem is that you’ve got special logic whenever a key object is null. For example, you need a basic billing plan by default, if the customer object isn’t set up and doesn’t have its own billing plan:

    if (customer == null)
    billing_plan = Plans.basic();
    billing_plan = customer.getPlan();

    This is a small example, but it’s important to realise why this is undesirable: partly because each branch point is another potential source of error, and parly because your “null customer” logic isn’t just here, but is probably scattered all over your code. It would be better to have all the logic in one place, where it can be seen and managed and tested more easily. The solution is to introduce a null object — in this case a NullCustomer object such as this:

    public class NullCustomer
    // ...probably lots missed out here.
    // Meanwhile the getPlan() method of NullCustomer
    // just returns the default basic plan...

    public Plan getPlan()
    return Plans.basic();

    Every time we set up a customer we should now use this NullCustomer as a default rather than the non-specific null object. And our original if/then/else code can be reduced to this:

    billing_plan = customer.getPlan();

    That’s simpler and clearer, it doesn’t have any branches, and the logic for a null customer lives in one place.

Refactoring is a wonderful thing. And it’s even more wonderful if your IDE helps you. For .NET languages you can get the Refactor! plugin, and MSDN has video demo, too. All that code manipulation at the touch of a button!

All installments:

The benefits of OO aren’t obvious: Part III of V, better testability

Previously I wrote:

Sometimes you work with something for so long that you forget that other people aren’t familiar with it. This happened recently talking to a friend of mine — “I’ve got no need for object orientation”, he said, and I was shocked. […] What, I wondered to myself, were the benefits of OO that he was missing? How could I persuade him that it does have a point? So here is my attempt to trawl my memory and see how I’ve changed in response to OO, and why I think it serves me (and my projects) better. In the end I’ve found that the benefits are real, but nothing I would have guessed at when I first started in this field.

I’ve already written about data hiding and separation of concerns. Now read on…

Better testability

Another great benefit of OO is better testability. The simplest kind of testing you can do is unit testing — taking a single class and testing it s methods in isolation. By being able to instantiate individual objects and ensuring there are minimal dependencies it becomes fairly easy to test almost every branch of your code. Fairly easy, but not trivial — there’s a difference between writing merely working code and writing testable code. Writing testable code is a very good skill to acquire, and test-driven development is a further discipline that’s certainly improved my code (and my peace of mind) no end.

There are further aids to testing that OO gives you, most notably stubs and mocks. Stubs are dummy implementations when you need a non-critical dependency (such as Repository for an Article, but you’re not actually testing the Repository). Mocks are dummy implementations which simulate a given interaction and allow you to check that interaction afterwards. Both of these are easier with dependency injection.

Personally I think the greatest step forward in testing is unit testing, and automatic unit testing — have a look at NUnit for .NET languages and JUnit for Java. Even coding for fun, I’ve found a great sense of relief in not having to think about the correctness of my code, but leaving that to my unit testing framework. It means I can leave my code at any point and not wonder about whether or not it really works — because I know it does — and not have to hold anything in my head for when I return, because everything will be verified by the tests. Pedants will say there’s a lot more to testing than unit testing, and they’d be right, but to go from next-to-no testing, or infrequent testing, to automated unit tests is a huge step forward, and I cannot recommend it highly enough.

All installments:

The benefits of OO aren’t obvious: Part II of V, separation of concerns

Previously I wrote:

Sometimes you work with something for so long that you forget that other people aren’t familiar with it. This happened recently talking to a friend of mine — “I’ve got no need for object orientation”, he said, and I was shocked. […] What, I wondered to myself, were the benefits of OO that he was missing? How could I persuade him that it does have a point? So here is my attempt to trawl my memory and see how I’ve changed in response to OO, and why I think it serves me (and my projects) better. In the end I’ve found that the benefits are real, but nothing I would have guessed at when I first started in this field.

I’ve already written about data hiding. Now read on…

Separation of concerns

This is one of those things that is a consequence of OO fundamentals, but by no means obvious. The root of so many problems in software is dependencies. A depends on B, B depends on C, and before you know it everything depends on everything else. It makes things very difficult to manage: changes become slower and slower, and riskier and riskier. Separation of concerns is about moving away from this: by forcing things into classes it encourages each class to do its own specific job. This breaks the dependencies and makes the management of your code much easier.

But it’s not that easy or obvious, and requires a fair bit of imagination to see it working really well. Let’s take a very simple example.

Suppose Guardian Unlimited has an Article class, and articles can be saved into a Repository (probably a database). Then we have two classes, each with a separate concern; the Article class has a save() method which uses the Respository to save to the backing store. But hold on there — we’ve still got the chance for it to go wrong.

An easy mistake would be to have the Article manage the setup of the Repository: when you create an Article it sets up the Repository all by itself. In some ways this is nice (it’s a form of encapsulation) because you can create and save an article without worrying about how the save is done, or even that there is a Respository at all. But there is a nasty dependency here: the Article depends on the Repository: not only are we unable to set up an Article without having a Respository, but it’s the Article which determines what the Repository is — if your application requires a working SQL Server database then it will always require a working SQL Server database, even if you just want to do a little test.

Enter dependency injection [1, 2, 3]. This is a technique that is by no means an obvious consequence of object orientation, but is facilitated by it.

In our example the Article depends on the Repository. It also specifies exactly what the Repository is, because it sets it up internally. Dependency injection says we shouldn’t allow the Article to set up the Repository; rather, we should set up the Repository externally and inject it into the Article (for example by calling a setRepository() method). The dependency isn’t broken, but the dangerous element is: the particular kind of Repository is no longer a concern of the Article.

What this means in practice is that you can feed any kind of Repository into the Article and it will still work. You can even inject a “do nothing” Repository just in case you want to do a quick test which doesn’t require saving it into the real database. Or you could inject a disposable in-memory Repository which only lasts for a limited time.

All of this is possible with another feature of OO: abstraction. If we ensure Repository is only an abstraction (rather than a specific concrete class) then we can have all kinds of Repository implementations: a SQL Server one, an in-memory one, one which always throws an error (so you can check how your application handles errors), and so on. In VB.NET you’d use an interface for this particular example.

Dependency injection forces concerns to stay separate and makes your code much more flexible and easy to manage. But that wouldn’t be obvious to you if you’re just starting out in OO development, and if you’re just looking at the concept of objects, inheritance, and so on.

All installments:

The benefits of OO aren’t obvious: Part I of V, data hiding

Sometimes you work with something for so long that you forget that other people aren’t familiar with it. This happened recently talking to a friend of mine — “I’ve got no need for object orientation”, he said, and I was shocked. Yet I couldn’t easily explain why I thought OO was far superior to procedural languages. And thinking about it afterwards I realised that its advantages for me weren’t obvious — almost none of them are clear from OO’s fundamentals (inheritance, abstraction, etc), and almost all of the advantanges are things that have been learnt by most of us recently, years after OO first appeared.

My friend spends most of his time writing embedded software, but occasionally is called upon to write a front-end interface. His front-ends are Windows-based and for his last GUI project he chose to try out VB.NET rather than his usual VB6. He was baffled by the verbosity and he felt it didn’t add anything. It all seemed rather unnecessary. And yet to me it’s long been the only way.

What, I wondered to myself, were the benefits of OO that he was missing? How could I persuade him that it does have a point? So here is my attempt to trawl my memory and see how I’ve changed in response to OO, and why I think it serves me (and my projects) better. In the end I’ve found that the benefits are real, but nothing I would have guessed at when I first started in this field.

Not the point of OO

DNJ Online has an article on migrating from VB6 to VB.NET, and it says that from this point of view the OO language is “unlike anything you’ve used before”. This is helpful in its honesty, but that particular article explains the low level differences rather than the high level aims. Try/catch blocks and namespaces are definite steps forward, but they hardly justify learning an entirely new paradigm, let alone phasing out a well-established one.

Similarly, common justifications of OO is that objects promote code re-use and allow better modelling of user requirements, but it’s hard to see how this is obvious. Regards the former, procedural languages allow code reuse via libraries. Regards the latter, it’s just as easy to see user requirements as procedural things as object-based things — easier, arguably.

For me, the advantages of OO are not obvious. But they are tangible once experienced, given the right environment. For me, those advantages are…

Data hiding

Even I’m horrified by how little fits into my tiny brain, and the more I have to keep in my head at any one time the more likely I am to make mistakes. In terms of development, the more attention I have to pay to the intricacies of my software the less attention I can pay to solving whatever the immediate problem happens to be.

Fortunately OO’s concept of data hiding (or encapsulation, if you like long words) is for people like me. By making explicit what I need to know (public methods/functions) and hiding what I don’t (private methods/functions and fields/variables) I am freed to focus on the task in hand. Private fields and methods allow my classes to manage themselves without having to burden the user with the details. You can look at the class (or its auto-generated documentation [1, 2, 3, 4]) and focus on what you need to know, without getting distracted by the details.

Anything which makes me think less is huge benefit.

Of course, data hiding by itself isn’t enough to switch programming paradigm, and if you’re writing in VB6 you’ll already have the “private” keyword. But this is only the start…

All installments, which I’ll post over the next few days:

A field guide to simplicity

Here’s my contribution to Brad Appleton’s stimulating conversation-piece on simplicity. I’ve produced a number of short soundbites to help understand simplicity, and backed some of them up with concrete examples of how Guardian Unlimited handles its articles.

Brad makes it clear that simplicity is a confusing and often contractory topic. But he’s written his article because

If we can come up with a modicum of modest, simple rules & principles to govern our design decisions in ways that help us minimize and manage interdependencies, eliminate constraints, and remove waste, then we are on the path to solving the real problem and meeting stakeholder needs in a way that is both simple and sustainable.

Strangely, he seems to fall foul of one of the very confusions he’s perceptive enough to identify. He says

The Agile Manifesto defines simplicity as “maximizing the amount of work not done.” But I think that’s a more accurate characterization of Lean than of simplicity.

but then goes on to cross-post his blog entry to several agile and extreme programming groups (1, 2, 3, 4,… thanks to Deborah Hartmann at InfoQ for the links). Indeed, that’s just unfortunate wording in the manifesto. The real issue, as Brad points out, is about complexity, and is something that even proponents of Big Upfront Design would support — indeed, it could be argued that Fred Brooks‘s “surgical team” model of development emphasises a clear vision from a single developer, and therefore produces something uniquely unconfused.

Soundbites on simplicity

For me, it’s the following one-liner of Brad’s that jumps off the screen:

Hiding complexity isn’t the same as removing complexity.

That pithy statement not only sums up many software problems that I’ve come across, it’s also a gem just small enough to fit into my tiny, PowerPoint-addled, brain. Hiding complexity isn’t necessarily wrong, but it is often mistakenly confused with removing complexity and is thus the cause of much pain.

So inspired, I sought to produce some more soundbites. Here’s what I’ve come up with, while trying to avoid agile-specific leanness, or indeed simplicity of user interfaces (say). The observatins below are all about simple design of software, something that should be relevant to any software developer or architect, agile or otherwise. There is some discussion with examples afterwards:

Topic Common mistake Real goal
Time Minimise the effort to produce results. Minimise the effort over the total lifetime.
  Take the most straightforward route. Produce the most straightforward result.
Concepts Hide complexity. Remove complexity.
Use existing concepts. Use simple concepts.
Avoid introducing new concepts. Avoid introducing concepts new to the problem domain.
Look clever. Be clever.
Comprehension Ensure there are few lines of code. Ensure there are few steps to understand the code.
State the obvious. Increase obviousness.
Be elegant. Be comprehensible.
Make it intuitive (to some unspecified person). Make it intuitive to a newcomer.
Obviousness is the first thing you think of. Obviousness takes hard work.

Simplicity and time

As Brad says: “…simple and sustainable”. We’ve all been in the situation where a simple hack costs us dear in the long run. It’s called technical debt. The cost of doing something isn’t to be calculated over the immediate future, but over all time.

You can see an example of this with our search engine, which is built on Endeca‘s technology. If you search for “major” you’ll see two kinds of results appear: in the main part of the page you’ll find articles with the word “major” in them, but above those you’ll see links to politicians’ pages (e.g. former prime minister John Major) as well as films with “major” in the title (such as Major League II). Now, the first thing we did with Endeca was create an Endeca thingumy called an “Adapter” which would allow the engine to search articles. But politicians’ details are not stored as articles, they are separate entities in the database. So given that we already had articles being found, how could we best develop the system to fetch politicians’ pages? There were two options.

First we could create another Adapter, but one which allowed Endeca to read politicians’ details. The problem with this is that creating an Adapter takes time, particularly if you’ve only ever created one of them ever before. Additionally, an extra Adapter might be seen as making the search engine more complex. The second option was to allow the underlying database (Oracle) to present politicians’ details as if they were articles, and then Endeca would read them as it did any other articles. This would be quicker because we’ve had years of experience with Oracle against months of experience with Endeca. Also, the search engine software would have remained virtually untouched.

The right answer as far as simple design goes is to choose the first option. It might have taken longer, but it is completely consistent with the existing search engine architecture. The second option might not have increased the complexity of the search engine, but it would have increased the complexity of the database layer and anything that interacted with the politicians’ details. It would have been unintuitive (clever, yes, but unintuitive) and therefore more time-consuming for anyone else to maintain down the line.

We could have chosen the quick-for-now option, but it would have cost us more time in the future.

Simplicity and concepts

This is the section where Brad’s original observation about hiding complexity fits in.

To me, one touchstone of conceptual simplicity is the importance of the domain and domain-driven design. The software problem should relate entirely to its domain. If the domain is complex then the software supporting it will necessarily be complex, but it should be no more so than the problem domain itself. Certainly the software should reflect the problem domain, but it should not bring in any complexities of its own, at least none that infect the domain layer.

A straightforward example of this can be found on the front page of Guardian Unlimited. As we saw above, most of what we produce are, of course, articles. Articles are fundamental to the system. But on the front page you can see a long list of stories, with pictures, links and sometimes little snippets of text. This is called a “trailblock”. When creating something like the front page we could have said trailblocks are just special kinds of articles. This would be the elegant solution, the clever solution, the solution which doesn’t introduce new ideas, and which therefore might be regarded as the simplest solution. But it would also be the wrong solution, because that is not how our journalists think of these things, it is not how they like to manipulate these things, and to contort an article into an unsuitable shape would have caused all kinds of problems.

What we ended up with was a trailblock being an entirely new type of content. As a result we have specialist editing tools (which are quite unlike articles’ editing tools), various ways of rendering them, and extensions such as RSS feeds which would be near-impossible if trailblocks were really just glorified articles. We introduced a new concept, but that concept was only new to the software. It was not new to the problem domain — the world that journalists inhabit.

Simplicity and comprehension

To take another example, I used to work with a wonderfully elegant system (now sadly defunct) which was so pluggable and so configurable that it was often impossible to work out where certain things came from. One day our sysadmin thought he’d like to try his hand at coding, so we gave him a simple job to do: the command line already had a help message, but we’d like it to be a bit more comprehensive. He returned in despair after an hour: “I’ve grepped through the entire source, and I cannot find one instance of the word ‘help’.” Yes, our code was so configurable, even location of the help message was obscured.

Making something appear obvious is hard work, and you don’t get people admiring your work so much because, well, you’ve made it look obvious. But nor do you get people lightly cursing you, such as the developer I know of who delved into a system and discovered that one of his predecessors had named a crucial function “.”. That’s a single fullstop. Clever? Yes. Helpful? Unlikely.


There’s no pithy conclusion to this. I hope the guide above leads us to better understand what simplicity of software involves, and how to distinguish it from its near neighbours. Certainly it’s made me think carefully about a subject I’d previously underestimated.