QCon London 2009: A few lessons learned

Last week I attended QCon London 2009, which was characteristically excellent — and I know my colleagues who attended thought the same. Most excitingly this year two of the Guardian development team were invited to give presentations on our most recent work building a large content management system for a very large website. I don’t mind admitting that I learnt one or two things from attending those talks; I also found the presentation on the BBC’s architecture fascinating. Here are some of the things I took away…

Mat Wall on The evolving Guardian.co.uk architecture

Mat’s presentation was structured around rebuilding guardian.co.uk over many months and dealing with the scalability issues that were presented at various stages. Some of the many insights were presented in a deliberately provocative way, such as this one:

Developers try to complicate things; architects simplify them.

I’m not entirely sure about the first part, but it’s certainly true that over-engineering is a much more common trait than under-engineering. The second part, though, is very important. It’s something I’ve always known, but never quite in that pithy way. It’s a useful phrase to tuck away for future reference.

And if you think Mat wasn’t feeling too good towards developers, don’t worry, because he also put up a slide saying

Developers are better than architects.

The message behind this is that it’s the developers who are working on the code each day — they know what’s possible and what not, and what’s reasonable and what’s not. So it’s important to trust them when they say they can or can’t do something.

Dirk-Willem van Gulik on Forging ahead – Scaling the BBC into Web/2.0

Dirk-Willem talked about the programme to re-engineer the BBC websites to enable greater scalability and dynamic content. In an organisation the size and complexity of the BBC you need to spend more time thinking about people than the technology. One of his first slides underlined this by presenting the seven layers of the OSI model (physical up to application) and then said

But most people forget there are two layers on top of that: level 8, the organisation, and level 9, its goals and objectives.

From that point on he kept referring to something being “an L8 problem” or “a level 9 issue”. It was a powerful reminder that technology work is about much, much more than technology.

Another great insight was how much they have chosen to use the simplest of standard internet protocols to join various layers and services within their network — even when those layers and services are organisation- and application-specific. This ties back to Mat’s point about an architect’s job being one of simplification.

Phil Wills on Rebuilding guardian.co.uk with DDD

Phil talked about the role domain driven design has played for us. He also pointed out various lessons that there were learnt along the way, such as the importance of value objects, and the fact that “database tables are not the model — I don’t know how many times I can say this.”

But it was only a day or two after his presentation that I was struck by a remarkable thing: Phil referred so many times to specific members of the team, and talked about contributions they had made to the software design process, even though the people he was talking about were generally not software developers. He put their pictures up on the screen, he named them, he told stories about how they had shaped key ideas. This was an important reminder to me that being close to our users and stakeholders on a day-to-day basis is incredibly important to the health of our software.

Walking away

I didn’t get to see as much at QCon as I’d like to have done, although I dare say most people will say that, even if they went to as many sessions as was physically possible. But what I did see was fascinating and thought-provoking, even when it came from people I work with every day.

An ABC of R2: D is for domain driven design

…which Mat Wall and I have written about extensively before, However, for this piece let me say this…

When you have a huge number of people for whom you are building software (1500 staff, 20 million unique users, and an entire wired economy influencing which way you should go next) then simply following instructions is insufficient, because your users’ demands will change and evolve over time — even if not during the current project then certainly before Version Two. So to minimise the cost of those changes you need to understand the way your users are thinking.

That’s where domain driven design (DDD) comes from. It’s about taking the concepts in your users’ heads and embedding them straight into the software you’re writing. And then when those concepts evolve and change the cost of changing your software is directly proportional to the mental shift that your users are making. When your users say “I want to make a small change” then usually it’s a small cost; if they propose a big change then they should understand when you walk them through all the implications.

For the R2 project we used DDD from the start, and it was key to many of our successes: when we discovered new opportunites which arose only from direct use of what we had implemented, then DDD allowed us to realise them.

Take, for example, the idea of “tone” — the principle that we should be able to categorise content by its “voice” or “style” (well, its tone). Its tone might be obituary, blog post, match report, and so on. The vague notion of this had been around since the start of the project, but we hadn’t settled on many details, let alone how to implement them. But after a few releases, and when the software started getting real use, it became apparent that applying a tone should be very like applying a keyword. Suddenly things fell into place. The functional requirements were clear, as were the implementation details. Tone has become a very powerful feature (here are all our obituraries, all our blog posts, all our match reports,…) yet it was a relatively straightforward piece of work because it was, in the end, a relatively intuitive idea.

Of course, domain driven design has a lot more to it than I’ve described here, and our use of it has been much deeper than feature implementation. If you’re lucky enough to be going to QCon San Francisco then today you can see our software architect Phil Wills presenting a lot more detail there.

Lightweight versus heavyweight: The cost is in the management

A recent conversation with a colleague got me thinking about so-called “lightweight” systems, and when they become more trouble then they’re worth. He was frustrated by some problems he was having; even more so, he explained, because he thought he was dealing with something that was “lightweight”. It’s a seductive word, and sometimes — as with other forms of seduction — when you get more involved than you should things can get a bit sticky.

This article is an attempt to explain what lightweight really means, both in terms of benefits and drawbacks. There are also a couple of comparative examples from my own experience.

A lightweight system (plus management support)Lightweight doesn’t mean simple

People often mistake “lightweight” to mean simple or quick. But this can’t be right, because everyone wants simple and quick, and if it really meant this no-one would use anything else. Every website would be rewritten with the lightweight Ruby on Rails and every application would be sitting on top of the lightweight SQLite database. Who wouldn’t? Who doesn’t want simple? Who doesn’t want quick?

Lightweight is often good, but it must have its tradeoffs, otherwise other technologies wouldn’t exist.

From the examples below I see lightweight as offering low cost in return for low demands, high cost for high demands. Heavyweight is disproportionately high cost for low demands, but low cost for high demands.

Lightweight carries low inherent management costs. But some situations require a high degree of management control whether you like it or not. That means that if a lightweight system needs to scale up you have to wrest management from it and maintain it externally. If you can do that then the lightweight system continues to work, but if the lightweight system will not relinquish management control, or if you don’t have the discipline to keep the management going, then it won’t be effective in the long run. By contrast heavyweight systems impose management and structure of their own. This is good if you’re going to need it, as it takes the pressure of discipline off you, but it’s not effective if you didn’t need that management structure in the first place.

To illustrate this, here are a couple of lightweight/heavyweight comparison case studies…

Language example: TCL and Java

TCL is a lightweight language. You get to write Hello World in one line, it doesn’t force much structure on you, and it’s pretty relaxed about how it’s written.

TCL is so good, in fact, that was the basis of the original Guardian Unlimited website. We built Ajax-style tools with it before Ajax was known as a concept, we generated our front page from it, we used it to integrate with our ad server.

But as our site grew the language didn’t scale with it. Clever shortcuts implemented by earlier developers confused newer developers because they obscured the purpose of the code. The lack of an imposed structure meant every foray into older code involved learning its idiosyncracies from scratch. Development slowed down as we worked around older code. And when we wanted to redesign the website we found that through years of lightweight flexibility we had allowed ourselves to be tied into knots: it would be more effective to start again than to work with what we had.

In fact, for the most part we’re now using Java…

In contrast to TCL, Java is pretty heavyweight. Not only does Hello World require three lines (excluding any lines with just braces), but its philosophy of structure and layering percolates through from the core language to most of its add-ons. For example, to parse an XML document you have to drill through two abstraction layers before you can find the parser.

One Java framework that maintains this ethos is Hibernate, used for database access. Its architecture is complicated, and as usual this is to offer flexibility without relinquishing manageability. Recently a forthcoming release of the Guardian Unlimited website was failing its pre-production performance tests. Our developers tracked down a major cause of the problem to an inefficient query within Hibernate. They extracted some of the query’s logic up into the application layer and simplified what remained, rebalancing the work between the application and the database. Problem solved, performance restored. What’s relevant to our story is that the developers did this entirely within the archicture of Hibernate, so they didn’t compromise the design of the application and therefore didn’t add complexity.

CMS example: WordPress and the GU CMS

Over on ZDNet Larry Dignan extolls the virtues of WordPress and says, effectively, “What have big content management systems ever done for us?”

WordPress is the lightweight CMS I’ve chosen for this blog, and I’m very happy with it. It’s easy to install, requires almost zero maintenance, and lets me focus on the writing. And yet I’m a strong advocate of the home-grown CMS we have for our journalists and subs on the Guardian Unlimited site. Is lightweight not good enough for our journalists? What has a big CMS ever done for us?

Well, I just looked at a current article on guardian.co.uk: “Ministers ordered to assess climate cost of all decisions”. It was created with our big CMS. What’s there that WordPress couldn’t deliver?

For a start it’s got a list of linked subjects down the side, which aren’t the same as WordPress’s tags because they’re tightly managed to ensure consistency and reliability. These subjects are also categorised, so Pollution and Climate Change are subjects under Environment, while Green politics is a subject under Politics. As I write this, I note also that the pages for Pollution and Climate Change are designed differently, with Climate Change being more pictorial and feature-led. Subject categorisations and subject-specific designs are beyond what WordPress’s tags do.

Okay, so apart from the linked subjects, the categorisations, and the subject-specific designs, what has a big CMS ever done for us? I suppose it’s worth mentioning the related advertising, which as write includes a large ad for environmentally-friendly washing liquid. There are other contextual commercial elements, too, such as the sponsored features, links to green products and books, and offers of reducing energy bills and offsetting carbon emissions. And there are related articles and related galleries. And details of the article history, listing when and where it was first published, on what page and in what newspaper section.

Okay, so apart from the linked subjects, the categorisations, the subject-specific designs, the related advertising, the contextual sponsored features, the links to relevant products and books, the complementary offers, the related articles, the related galleries, and the article history, what has a big CMS ever done for us?

Well, I suppose it is serving to over 17 million unique users a month…

I’ll stop now. The point is a lightweight CMS such as WordPress could probably do any one of these things, with a bit of work. But it isn’t designed to do anywhere near all of them. And each time it’s changed to do one more of these things the more it is moved away from its core architecture and it gets closer to a point of paralysis, where nothing functions well anymore because no part of it is doing what it was designed to do. A bit like the TCL example.

Looking back

Reviewing these two examples, it’s clear that the lightweight systems became, or would become, very costly when they were pushed beyond their initial expectations. In both cases the corresponding heavyweight systems came with their own (heavy) management structure, but that management structure ensures lower running costs.

In the Hibernate example our software maintained its architecture after we’d made our performance change; anyone looking at this new code would be able to rely on previous knowledge to understand what was going on. By contrast, anyone coming fresh to a snippet of old TCL code would be starting from scratch, regardless of how much of the other TCL code they’d seen.

Similarly, the large-scale content management system at GU is internally consistent, despite its vast range of features and functionality. Once someone has learnt the principles (which, admittedly, are non-trivial) they can get to work on pretty much any part of it. Pushing WordPress to do that would have created a monster.

Lightweight systems take the management away from you. And that’s ideal, as long you don’t need that manageability.