Lightweight versus heavyweight: The cost is in the management

A recent conversation with a colleague got me thinking about so-called “lightweight” systems, and when they become more trouble then they’re worth. He was frustrated by some problems he was having; even more so, he explained, because he thought he was dealing with something that was “lightweight”. It’s a seductive word, and sometimes — as with other forms of seduction — when you get more involved than you should things can get a bit sticky.

This article is an attempt to explain what lightweight really means, both in terms of benefits and drawbacks. There are also a couple of comparative examples from my own experience.

A lightweight system (plus management support)Lightweight doesn’t mean simple

People often mistake “lightweight” to mean simple or quick. But this can’t be right, because everyone wants simple and quick, and if it really meant this no-one would use anything else. Every website would be rewritten with the lightweight Ruby on Rails and every application would be sitting on top of the lightweight SQLite database. Who wouldn’t? Who doesn’t want simple? Who doesn’t want quick?

Lightweight is often good, but it must have its tradeoffs, otherwise other technologies wouldn’t exist.

From the examples below I see lightweight as offering low cost in return for low demands, high cost for high demands. Heavyweight is disproportionately high cost for low demands, but low cost for high demands.

Lightweight carries low inherent management costs. But some situations require a high degree of management control whether you like it or not. That means that if a lightweight system needs to scale up you have to wrest management from it and maintain it externally. If you can do that then the lightweight system continues to work, but if the lightweight system will not relinquish management control, or if you don’t have the discipline to keep the management going, then it won’t be effective in the long run. By contrast heavyweight systems impose management and structure of their own. This is good if you’re going to need it, as it takes the pressure of discipline off you, but it’s not effective if you didn’t need that management structure in the first place.

To illustrate this, here are a couple of lightweight/heavyweight comparison case studies…

Language example: TCL and Java

TCL is a lightweight language. You get to write Hello World in one line, it doesn’t force much structure on you, and it’s pretty relaxed about how it’s written.

TCL is so good, in fact, that was the basis of the original Guardian Unlimited website. We built Ajax-style tools with it before Ajax was known as a concept, we generated our front page from it, we used it to integrate with our ad server.

But as our site grew the language didn’t scale with it. Clever shortcuts implemented by earlier developers confused newer developers because they obscured the purpose of the code. The lack of an imposed structure meant every foray into older code involved learning its idiosyncracies from scratch. Development slowed down as we worked around older code. And when we wanted to redesign the website we found that through years of lightweight flexibility we had allowed ourselves to be tied into knots: it would be more effective to start again than to work with what we had.

In fact, for the most part we’re now using Java…

In contrast to TCL, Java is pretty heavyweight. Not only does Hello World require three lines (excluding any lines with just braces), but its philosophy of structure and layering percolates through from the core language to most of its add-ons. For example, to parse an XML document you have to drill through two abstraction layers before you can find the parser.

One Java framework that maintains this ethos is Hibernate, used for database access. Its architecture is complicated, and as usual this is to offer flexibility without relinquishing manageability. Recently a forthcoming release of the Guardian Unlimited website was failing its pre-production performance tests. Our developers tracked down a major cause of the problem to an inefficient query within Hibernate. They extracted some of the query’s logic up into the application layer and simplified what remained, rebalancing the work between the application and the database. Problem solved, performance restored. What’s relevant to our story is that the developers did this entirely within the archicture of Hibernate, so they didn’t compromise the design of the application and therefore didn’t add complexity.

CMS example: WordPress and the GU CMS

Over on ZDNet Larry Dignan extolls the virtues of WordPress and says, effectively, “What have big content management systems ever done for us?”

WordPress is the lightweight CMS I’ve chosen for this blog, and I’m very happy with it. It’s easy to install, requires almost zero maintenance, and lets me focus on the writing. And yet I’m a strong advocate of the home-grown CMS we have for our journalists and subs on the Guardian Unlimited site. Is lightweight not good enough for our journalists? What has a big CMS ever done for us?

Well, I just looked at a current article on guardian.co.uk: “Ministers ordered to assess climate cost of all decisions”. It was created with our big CMS. What’s there that WordPress couldn’t deliver?

For a start it’s got a list of linked subjects down the side, which aren’t the same as WordPress’s tags because they’re tightly managed to ensure consistency and reliability. These subjects are also categorised, so Pollution and Climate Change are subjects under Environment, while Green politics is a subject under Politics. As I write this, I note also that the pages for Pollution and Climate Change are designed differently, with Climate Change being more pictorial and feature-led. Subject categorisations and subject-specific designs are beyond what WordPress’s tags do.

Okay, so apart from the linked subjects, the categorisations, and the subject-specific designs, what has a big CMS ever done for us? I suppose it’s worth mentioning the related advertising, which as write includes a large ad for environmentally-friendly washing liquid. There are other contextual commercial elements, too, such as the sponsored features, links to green products and books, and offers of reducing energy bills and offsetting carbon emissions. And there are related articles and related galleries. And details of the article history, listing when and where it was first published, on what page and in what newspaper section.

Okay, so apart from the linked subjects, the categorisations, the subject-specific designs, the related advertising, the contextual sponsored features, the links to relevant products and books, the complementary offers, the related articles, the related galleries, and the article history, what has a big CMS ever done for us?

Well, I suppose it is serving to over 17 million unique users a month…

I’ll stop now. The point is a lightweight CMS such as WordPress could probably do any one of these things, with a bit of work. But it isn’t designed to do anywhere near all of them. And each time it’s changed to do one more of these things the more it is moved away from its core architecture and it gets closer to a point of paralysis, where nothing functions well anymore because no part of it is doing what it was designed to do. A bit like the TCL example.

Looking back

Reviewing these two examples, it’s clear that the lightweight systems became, or would become, very costly when they were pushed beyond their initial expectations. In both cases the corresponding heavyweight systems came with their own (heavy) management structure, but that management structure ensures lower running costs.

In the Hibernate example our software maintained its architecture after we’d made our performance change; anyone looking at this new code would be able to rely on previous knowledge to understand what was going on. By contrast, anyone coming fresh to a snippet of old TCL code would be starting from scratch, regardless of how much of the other TCL code they’d seen.

Similarly, the large-scale content management system at GU is internally consistent, despite its vast range of features and functionality. Once someone has learnt the principles (which, admittedly, are non-trivial) they can get to work on pretty much any part of it. Pushing WordPress to do that would have created a monster.

Lightweight systems take the management away from you. And that’s ideal, as long you don’t need that manageability.

We (heart) UAT

When do we need user acceptance testing? And when can we get away without it?

User acceptance testing (UAT) is when your software goes in front of the user to get final sign-off — and when they ask for changes if not. In theory you shouldn’t need UAT at all (didn’t they tell you what they wanted? Weren’t you listening?), and indeed perhaps you can sometimes get away without it.

Testing a search engine rigorouslyThis argument is stronger if you’re considering UAT throughout the project, not just at the end. This would be the case if you’re UATing individual features, or individual deliveries in a culture of more and smaller iterations. The arguments are: more deliverables mean more UAT means more expense for the customer; smaller iterations mean a reduced chance of getting it wrong; frequent deliverables give the customer more opportunities to change anything they don’t like anyway.

However, even with small, frequent deliveries there are circumstances when it’s more important to do UAT. Looking back on some projects I’ve worked on, here are times when it was, or would have been, a good idea…

1. When the output of the software is subjective

I once worked on a project that asked what bands you liked and then based on that recommended albums by unsigned bands. This was a big leap away from the “people who liked that also liked this…” kind of recommendations, because the unsigned bands won’t have had a significant userbase, so we couldn’t rely on a self-generating wisdom-of-crowds mechanism. Plus, of course, music recommendation is highly subjective, and it was our clients who were the music experts — we implementers were mere software developers.

In this case you can see it was key for our clients to play with the system and be sure they were comfortable with it. If they didn’t then it would have meant tweaking the algorithm, not a complete software rewrite of the software. In the event something rather unexpected happened. While our client’s representative was very happy with the output he also found it rather disconcerting. He felt somewhat disenfranchised from the system because it seemed so mysterious. He realised that he couldn’t demonstrate it in front of his colleagues and their investors without having an answer for the suddenly inevitable question: how does it work? And once this was explained to hiim he became much more comfortable again.

If the output of the system had been much more objective and predictable this instance of UAT wouldn’t have been so important. In the end it threw up an important new acceptance criteria for client which we were quickly able to address.

2. When the software has a costly userbase

One of my past projects involved creating a user interface for a property data entry team. Individuals on the team were, as you’d expect, low-paid and largely interchangeable — after an hour of training you would know everything there was to know about the system. Your sole function was then to spend hour after unsociable hour entering information about properties you could never afford to live in for people you would never want to live next to.

But although the daily cost of individuals on the team was considered low, the cost of the team as a whole — and the value they were bringing to the business — was very high.

Problems with the user interface in this case would have had a knock-on effect across the whole team of 20 or 30 individuals, and cost their employer dear. It was key for someone to check not only that the system was responsive, but also that there were intuitive keystrokes across the data fields, data entry was reasonably forgiving, tabbing between fields happened in the order that data was presented in, and so on. In this case, the time and early feedback from one user at the vanguard saved time and cost 20 or 30 times over.

3. When you don’t have a QA department

A pretty obvious one, but if you don’t have people dedicated to quality assurance then those internal people who take on that function (probably the lucky developers or irritated account manager) will be too close to the software and too distant from client to pick up on all the issues that are important to them.

A good percentage of the projects I’ve worked on have involved integrating search engines. And in almost every instance when testing has been left to the development team they’ve alighted on one or two key phrases to use as test cases and considered that sufficient. By contrast, the end user to whom this is important will have half a dozen phrases that are relevant each day — and they’ll involve accented characters, non-ASCII characters, mismatched quote marks, and so on. And that’s before we get to the results page. Without effort the expert user has given the search engine a good workout — often raising difficult issues for the developers.

Actually, even if you do have a QA department then there’s a case for UAT, because more often than not the QA team will test against written criteria. There are very often criteria which, with the best effort in the world, just don’t get realised or written down at the requirements stage. And in these cases it’s only the end user who can really tell if it’s right.

4. When you lack detailed requirements

Another fairly obvious situation, but if you’ve not had a good bit of customer input at the start of the project, then UAT is going to be even more important later on.

A sales person I worked with once had a cunning plan. We’d already produced an e-commerce site for one of our clients, and he decided to sell a duplicate system to one of their rivals. It was good plan on paper. There was no contractual restriction for us; the new client would get their site at a relatively low cost; they’d get it quicker than usual; and we’d only have to reskin it. The deal was done over a nice lunch.

You can guess what happened in reality. When they saw what we imagined was a near-complete version they wanted some changes. The checkout system wasn’t quite to their liking; their product catalogue didn’t have quite the fields we were expecting; the order placement system didn’t align with what they had internally; and so on. It cost everyone more than they had anticipated and it was delivered later than anyone would have liked.

Certainly this is a strong case for robust specifications — or at least understanding what you’re getting into before you get into it. But it also demonstrates that lack of early information cost us dearly when the information did come along later. We hadn’t made time for UAT, which is why we delivered the final product later than planned. Fortunately for everyone there was no externally-driven deadline. If there had been then we’d have been obliged to deliver something which the client wasn’t happy with.

5. When the inputs to the software are difficult to specify

MediaGuardian.co.ukFinally an example that brings us right to the present.

The current work to refresh the Guardian Unlimited website is much more than applying a lick of paint. We’re also producing new tools for our reporters and new ways they can tell their stories. But telling a news story is not simply a matter of filling in a few boxes and clicking Save. To make a story relevant requires a lot of hard training. To edit an entire news section additionally needs an awful lot of experience and good intuition…

As I write, MediaGuardian.co.uk is leading with three big stories and a podcast, plus some smaller items headed “More media news”. The stories have between three and six sublinks which indicates the depth of related news that the editor has available. Yet at the same time the Technology section is showcasing six stories, all with a bit of blurb, but only one of which has sublinks (to a video and a gallery). In both cases the editors are using the same system, but the page layout has to balance itself out regardless of the highly variable input.

But though the input is highly variable it is not without bounds. A section would never be showing just one story, and an editor would almost certainly not try to highlight twenty stories and claim they had largely equal weight. The only way to really know the bounds in which the system is supposed to reasonably work is to sit an editor down with the tools and let them lay out pages in response to a real day’s stories. No amount of requirements analysis and QA experience can substitute for a real journalist responding to a real day’s events.

And that’s the beginning…

No mechanism should be applied blindly; there are times when user acceptance testing is more appropriate, and times when it is less appropriate. The important thing is to be aware first that UAT exists, and second that there are criteria which you can apply to assess how valuable it can be.

Amazingly, some people aren’t motivated by efficiency

Staggering though it may be, it turns out that people are different. It also turns out that certain kinds of people are different to other kinds of people. And a corollary of this is that people who aren’t software engineers tend to have a different perspective to those who are.

For example, I spend a lot of my time trying to maximise our efficiency. Maybe this is because I’m a techie kind of person; I see the same motivation among the project managers, developers, and others in our software team. But very often we face the prospect of having to start a piece of work without having all the details available to finish the job. Maybe we’re waiting for information from a third party, maybe the visual design isn’t complete, maybe a decision still needs to be made by people elsewhere in the organisation. Whatever it is, we’re faced with the prospect of starting a piece of work knowing that we may not be able to complete it without an interruption.

We built the car very efficiently...This is clearly going to be frustrating to my techie, left-brained approach to life. It undermines my efforts to be efficient, organised, anally retentive, and generally less fun at parties.

I know it would frustrate others, too. Agile advocate Simon Baker recently railled against organisations that didn’t make product owners sufficiently available to provide the relevant feedback and information. He wrote

If the project is vital to the business, then the company can always find a way to provide a full-time and colocated Product Owner. If they say they can’t, it really means they won’t. Quite simply, they’re not prepared to do what is necessary to achieve it, and frankly, if they’re not going to take the project seriously why should you?

Ouch. I’ve felt the pain that Simon felt when he wrote those words, but we shouldn’t rush to make harsh judgements on business experts who are facing pressures of their own.

We software people can talk all we want about our methodologies, but sometimes we need to wake up to the cold, wet slap of reality. The fact is software development is not the be-all and end-all of most businesses — far less is efficient software development methodologies. Sometimes it’s more important to be working on the most important thing inefficiently than it is to be working on the second-most important thing efficiently.

At Guardian Unlimited we motivate ourselves by measuring velocity — the number of units of work we complete in a given period. But if we’re not careful we can focus on that too much and be in danger of missing the bigger picture.

Some time ago I was speaking to one of our internal customers, explaining why developers pushed back on incomplete specifications, and the motivation of being efficient and achieving target velocity. “I don’t give a stuff about velocity,” she said, “I just want the thing built.” The point is well taken. Sometimes we need to remember what the word “customer” means.

Anti-features

Sometimes you can trust too much. Wherever I’ve worked I’ve been involved in a few examples where we listened to the customer, trusted them to know what they wanted, given it to them, and they’ve regretted it. We have delivered anti-features.

A lamp with slightly too many featuresMost recently at Guardian Unlimited our (previous) homepage had clever layout rules whose logic sometimes overrode the content that editors entered. The result was that editors were often confused that the page didn’t render according to what they had put in the system. The clever layout rules had been devised by close collaboration with the original editors and graphic designers — the expert users. They reasoned that they didn’t want anyone to unbalance the layout by entering inappropriate combinations of text and images.

But these layout rules had been forgotten over time, hence the confusion years later. Consequently the tech team was often called up regarding a supposed bug (“I’ve put this image in but it doesn’t appear”), and effort was expended only to discover that it was in fact a feature — much to the caller’s amazement. Our micro-lesson there was to give the end users the freedom they naturally expected, including the freedom to decide for themselves what was a balanced layout. If what they produced was unbalanced then the designers would steer them back in the right direction — a much more human corrective.

Those clever layout rules were an anti-feature. They were additional functionality that actually made users’ lives worse. Eventually we removed them.

Anti-features happen in highly experienced mega-corporations, too. In November 2006 Joel O’Software started what became known as “the Windows shutdown crapfest”. He compared the three shutdown options of Apple’s OS X with the astonishing nine or more shutdown options of Microsoft’s Windows Vista. Not only is that confusing for the user, but it was also incredibly painful to develop — Moishe Letvinn, a former Microsoft engineer, tells the sorry story.

But at least the Vista shutdown anti-feature made the problem visible. In the layout logic example, and others I know of, there is silent intelligence at work that leads to confusion and frustration without giving the user any visibility at all of what’s going on.

Anti-features are time-consuming to build in, because any features are time-consuming to build in. But anti-features also consume additional time in their removal.

One way to prevent anti-features is to help the end user determine the long-term consequences of what they want. Of course, you’d hope they’d think of that themselves, but you can’t avoid your responsibilities to the project as a whole if you see something that others have missed.

Another way is to adhere even more fervently to the Agile mantras of delivering early (before every last sub-feature is there), keeping it simple, and focusing ruthlessly on only the highest value work. This way we deliver first a front page without clever corrective layout logic, or one or two shutdown options only, and consider further enhancements later if we find we need them. Suggesting and doing this is easier said than done, of course, but if everyone trusts each other to listen honestly to what they have to say then it’s more likely the decisions made will be the best ones.

Meanwhile we can at least ask ourselves each time: Is this a feature or an anti-feature?

There’s nothing so permanent as temporary

Temporary workaroundAn aphorism I heard recently seems to be particularly memorable: “there’s nothing so permanent as temporary”. However, it wasn’t originally referring to software — it comes from a builder who is rebuilding the kitchen of friends. He’s from Azerbaijan, and my friends are fond of quoting him in full, to give the words maximum colour: “As we say in Azerbaijan, there’s nothing so permanent as temporary”.

It is, however, as relevant to software as it is to building (and probably to many other areas of life). The constant pressure to deliver means there is very rarely the opportunity to go back and improve a nasty historical workaround which is causing problems today. However, there are some strategies we might consider…

1. Automated testing

If the nasty programmatic stickytape is relatively small then a comprehensive suite of automatic tests should enable you to make the change relatively safely and painlessly. Of course, you need to have built up that suite of tests in the first place. And the “relatively small” caveat is important, because only then can the software people fit the replacement activty into their daily work without disrupting any schedules. If there’s no external change then this is an example of refactoring.

If the nasty workaround isn’t so small, but really does need replacing, then a much more concerted effort is needed. A plan needs to be carefully devised which allows slow piecemeal replacement. The point of the plan should be to take baby steps towards the end goal; each step should also leave the system in a stable state, because you never know when you might have to delay implementing any subsequent step. However, if everyone in the team knows the plan then the end goal can be achieved with minimum disruption to the business’s schedules.

2. Act like a trusted professional

I find trust and transparency is an increasing part of the software projects I’m involved with. Part of this is that the technical people want to give their customers options, and are keen to explain the pros and cons, so an informed decision can be made.

However, while this is usually excellent, there can be times when it is too much or undesirable. Stakeholders don’t necessarily want to be given options for every single thing, and sometimes it’s right for a technologist to make an executive decision without referring back — because they’re a professional, and because they are trusted (and employed) to be professional. In this regard it’s sometimes the responsibility of a technologist not even to entertain the possibility of a temporary workaround knowing it will become permanent.

The decisions I’ve been close to in this regard tend to be architecture or technology decisions. For example, in my current work we have, roughly speaking, legacy technologies and modern technologies. Sometimes a feature requirement would arise which is quicker to implement in the legacy technology — but of course we knew it would be more costly in the long run. If the difference in timescales was not wildly divergent then I’d encourage my team to choose the modern technology and not to offer up a choice. These days I need to do that much less, because by building up that base of modern technology it’s now easier to expand and extend that for future features. By making hard decisions in our professional capacity earlier on we’ve made it easier to make the right decisions in the future.

3. Bite the bullet

A third way to remove the temporary workaround is to just be honest about the work involved and the cost to the business. Easier said than done, but an open and frank discussion followed by adjustment and agreement will ensure the issue is examined properly and supported more widely.

One example I remember is a particularly troublesome database table. Originally conceived as an optimisation with a few hundred rows, it grew over time to be a real albatross, slowing both development and production work, and running to over 19 million rows. However, we couldn’t replace it without a lot of effort, because its tentacles were everywhere; we needed to schedule real time for it, and that would mean less time for other, more visible, work.

But when we presented the plan, two things sugared the pill. First, we had long cited the table as the reason for previous work taking longer than anyone would like, so the cost of its presence was already felt tangibly. Second, we ensured our plan was broken down using the “baby steps” approach outlined above — it would take a long time, but it would take out no more than 10% of our resources at any moment, and we could always suspend work for a period if absolutely necessary. The plan won support, was executed, and after several months our DBA typed the SQL “drop” command and we popped a bottle of champagne.

Meanwhile, back in the kitchen…

Of course, all that is assuming the temporary workaround really does need to be replaced. In my friends’ kitchen they are actually quite happy with the drawer dividers they’ve requisitioned as a spicerack. Aside from anything else, it provokes conversation and allows them to talk about their Azerbaijani builder’s many philosophies. He seems to have so many pithy sayings — all of which begin “As we say in Azerbaijan…” — that they’re beginning to suspect he may be making them up as he goes along. Still, if he does have to make a hasty exit from the building trade there may be an opening for him in the software industry.

Project versus product

There’s more to software development than getting a software project delivered to the users’ satisfaction, on time and on budget. (As if that wasn’t enough.) Sometimes someone has to ensure the system lasts for the long term. But when that happens it’s a balancing act: between the project and the product.

The project and the product might start at the same point, but they usually finish at very different points. A project finishes with the last milestone release of the code — and a party. The product finishes with the opposite: the last byte of code being removed from the servers — and a party only if it was really nasty software.

The balance between project and product is something that I see again and again in many different forms. Here are some examples from different stages of the project lifecycle…

Project versus productTools: Java verus Ruby

There’s an amusing little spat going on over at Michael Coté’s blog. It’s a fairly traditional tussle of Java versus Ruby (and other scripting tools), in which Coté accuses Java people of being overly concerned with internal architecture, while Ruby (and LAMP and Django) people are more concerned with delivering the goods. But Robert Cooper adds the telling observation that the maintenance of the delivered goods could well be a nightmare if it comes from the scripters, because it often involves a “mish mosh” of technologies with non-standard dependencies.

Robert’s point, more generally, is that getting the application out the door is only part of what a software team needs to worry about, and the choice of tools has a bearing on life beyond the project end.

People: Internal and external

Having an internal software team is a wonderful thing. Among other things you can build up a huge amount of expertise, both technical and business-specific, which helps everything go much more smoothly. But sometimes bringing in consultants from out the outside has a benefit, too, and beyond mere extra bodies. Consultants get to see many different projects, technologies, businesses and people over a very short time, and they bring this accumulated expertise with them.

Mixing internal and external people like this can work really well. In terms of ideas, external people can bring ideas and propose directions that might otherwise not have been considered. Internal people are all too aware of the long term implications of these ideas in their organisation and can stop the crazier ones before they get too far. In terms of making the best use of time, external people are often particularly driven by project deadlines, perhaps because they’re often only employed for projects, so it’s all they ever know. Internal people can see the time spent now but also have a very good feel for time likely to be spent in the future, so can determine better when additional time within a project is actually investment for the future.

Sometimes work is outsourced entirely to external teams. Sometimes work is undertaken entirely by internal people. When the two kinds of people mix there is creative conflict, and the balance between project and product is tested.

Release: Releases rarely happen once

I once worked on a six month project with a big bang release. At the end of the project was a single release which took three days. That wasn’t three days of integration, testing and launch. That was three days from the point when the software was entirely ready for the end users, to the point where it was actually available to those users.

In terms of the project this was fine: on the scale of a six month project, a three day release is negligible. But when we came to release 1.0.1 and release 1.0.2 it was a bit excessive, and by release 1.0.16 it was actually a serious business issue.

There are many lessons to be learnt from this, but one is that we had erred towards the project too much and our lack of consideration for the product caused us problems very, very quickly.

Soup to nuts

It’s too easy to forget about the product in the rush to the end of the project. At one time I realised I was so negligent in this area I stuck a large message on my monitor in 36 point font:

…and how will that be maintained?

Balancing project versus product affects everything from who you involve at the start through to how you hit your deadlines and on to how it affects the rest of your working life (and possibly your non-working life if you find yourself on call). There isn’t a one-size-fits-all answer, but at least if we are aware of the tension between the project and product we have a better chance of making more informed decisions.

Series: One nice thing about three new sites

Today we’ve launched three new sites on Guardian Unlimited. Well, more correctly, we’ve redesigned and rebuilt our Science, Technology and Environment sites. Working for a media organisation, I can rely on people much more articulate than me to tell the stories better — see Alok Jha on Science, Alison Benjamin on Environment and (with more technical info than you can shake a memory stick at) Bobbie Johnson on Technology.

So I won’t repeat what they’ve said. Instead, I want to point to one tiny new feature which I’m unfeasibly excited about. It’s series. You know series. They’re those things which appear regularly in a paper or on a website.

Flagging that it's a seriesNavigating through a seriesIn our redesigned sites we’ve introduced series as a specific and distinct concept. For example, Bad Science is a weekly series by Ben Goldacre on pseudo-science. Ask Jack is a series in which Jack Schofield helps people with their computer problems. In the old world these articles would be gathered into their own sections and left at that. But a series is a little bit more than this. For one thing when you land on an article in a series it’s nice to know that it is one of a family. We now flag that at the side of the piece. For another thing, you should be able to navigate to the next and previous articles in the series. Again, this series-specific navigation appears with each piece.

It’s a very small thing, but to me it shows something fundamental and important: the software we’re producing really does match the way our writers and editors think. It is, for those in the know, a result of domain driven design. For everyone else, it’s simply software that does what you think it should do.

The truth about quick and dirty

“Quick and dirty” is a weapon that’s often deployed by people for getting themselves out of a tight spot. But like any weapon, in the wrong hands it can backfire badly. Technical people use techie quick and dirty solutions in their work. Those people we like to work so closely with — our clients — use their own quick and dirty solutions in their work. But what happens when these people come together? What happens when it’s not the developers or sysadmins who suggest their own quick and dirty solution, but when it’s suggested by our customers on our behalf?

When it’s the client who suggests we deploy a quick and dirty solution, there’s a few things that we need to consider…

1. What does “dirty” mean?

There’s no doubting what “quick” means. But what does “dirty” mean? It’s something that makes the technicians itchy, so it’s worth understanding why.

A dirty solution might suggest one that’s inelegant, that doesn’t conform to a techie’s idea of a good design. That might be true, but it’s probably not the core of the matter. Any decent techie who works in the real world doesn’t advocate elegant design for its own sake. They know that most of their designs are far from perfect, and in fact they probably pride themselves in maintaining a pragmatic balance between the ideal and the expedient — this, then, won’t necessarily make a techie uncomfortable.

So “dirty” doesn’t mean “inelegant”. I think in this context it means something which incurs technical debt. That is, it’s a technical shortcut for which we are going to pay a rising cost as time goes on. That’s what causes the discomfort — the knowledge that implementing this will be making the techie’s life, and the life of their client, difficult in ways that cannot be really understood until it’s too late. It might happen also to be a inelegant solution, but that’s by-the-by. The important thing is that “dirty” implies something which carries a cost.

Of course, the cost of dirty has the benefit of quick. And as we go on, seeing this in cost/benefit terms is actually very informative…

2. Who gets what?

If you’re my customer and I’m your techie, then there’s a significant and unfortunate thing about quick and dirty that I need to tell you: You get the quick, but I get stuck with the dirty. Or to put it another way: the cost and the benefit are not felt by the same person.

There are times and circumstances when this is not the case. If an individual sysadmin is making a judgement call about their own work, say, then they may choose a quick and dirty solution. And if we are considering an in-house software team with in-house customers, then the company as a whole will both benefit from the quick and pay the cost of the dirty.

But these examples don’t tell the whole tale. If our individual sysadmin is actually one member of a systems team, and has responsibilities to the team, then it’s their colleagues who are going to be sharing the cost — and in fact it’s not entirely correct to have called them an “individual sysadmin” in the first place. And while it’s true to say our example company reaps the benefit and pays the cost, it’s probably not the case that “the company” has made an informed and conscious decision about the cost and benefit. That’s because it’s very unlikely that such a decision will have been escalated to the single body responsible for both the cost and the benefit — the board, or the owner, or the managing director.

In the end, we must recognise that the one who reaps the benefit is probably not the one who’ll be paying the cost.

3. Who decides what?

The “who gets what?” question leads in to a more general point: experts must be trusted to make judgements in their own areas.

So just as a techie should not tell a customer how to do their job, so the customer should not tell the techie how to do theirs. And in particular, it’s not up to the customer to decide whether the technical solution should be a dirty one.

Of course, we should always feel free to make suggestions; no-one is so big that they shouldn’t be open to ideas. It’s fair and reasonable for the customer to ask the techie if they can do it quicker… and for that the techie would need to demonstrate some creative thinking. The customer may ask if there are better solutions… and the techie would need to do some research. And the customer might even suggest a specific technical solution they’d heard of… and the techie would have a responsibility to give an honest and informed opinion. But if someone is supposed to be the expert on technical issues then they do need to be free to make judgement calls on technical issues, including which solutions are and are not appropriate. That shouldn’t be a worry to the customer; as mentioned above, good technical people tend to take pride in their pragmatism.

And in fact, this should be good for everyone. By trusting everyone use their expertise we should get the best solutions.

Quick and dirty: cost and benefit4. How long does it last?

One final point on the cost/benefit view of quick and dirty: Quick lasts only a short time, but dirty lasts forever.

More precisely, the benefit of quick lasts from the time the quick thing is released to the time the “proper” thing would have been released; and the cost of dirty also lasts from the time the quick thing is released, but it goes on until the time the software is rewritten or deleted.

I’d say that unless software is a total failure it tends to last a very long time, and usually much longer than anyone ever would have wished. This is why the term “legacy software” is heard so often — if software was generally replaced or removed in good time then we’d never have needed to invent that phrase. And the fact that the dirty bits outstay their welcome means that the technical debt they incur is paid, and paid, and paid again… certainly long after the shine of being early to market has worn off.

But it’s not all bad

I don’t want to give the impression that quick and dirty is inherently bad. It’s not. Nor is everything quick necessarily dirty. Far from it; some things are just plain quick.

But if we’re talking about quick and dirty then it helps to think in cost/benefit terms. We need to be very clear about who’s paying the cost and who’s reaping the benefit, and exactly what that cost and its benefit is. If we can understand that, and if we can respect each other’s expertise in making these decisions, then we can be more certain that we’re making the best decisions a better proportion of the time for the long term benefit of everyone.

Guardian Unlimited’s new look: Some background on templating

We’ve just launched the single most visible, most complex, and most trafficked page in the entire network of Guardian Unlimited sites — the Guardian Unlimited front. And in so doing we’ve revealed more of a new design that’s due to roll out over the next few months.

There are some really lovely editorial and commercial features in there, but I can’t give the whole game away now. Those tricks and subtleties will reveal themselves over the next few days, weeks and months as the news agenda and business opportunities unfold. However, I would strongly recommending looking at Emily Bell’s piece which explains what we’re doing, and Mark Porter’s background on the design.

Since this is a technology blog I want to focus on just a couple of technical areas that I’ve been tracking — templating and layout, and some of the issues we’ve had there.

Velocity hello world snapshotKeeping it simple with Velocity

Like the GU Travel site, perhaps our highest profile release in recent months, the templating was achieved with Velocity. This is no secret; Velocity was named as one of our technologies when we were recruiting for client-side developers earlier this year.I’ve noted with raised eyebrows the Velocity versus Freemarker skrimishes that have blown up on the Net from time to time — they have been both lengthy and deeply felt. I can’t honestly say that we ran a detailed and open comparison when we chose a templating framework, but we consciously chose Velocity because it enforces simplicity. I once recommended someone to the Velocity documentation page, and one of our developers piped up “That’s what I like about Velocity, it has a documentation page“. Admittedly, it’s a very long page, but it’s still one page. That’s a sign of its simplicity.

But maintaining simplicity is difficult, and it’s too easy when faced with basic tools and a complex problem to create hacks and workarounds. One of our warning signs of things going skew-whiff was when our Velocity templates were becoming difficult to understand. I know a few people who would regard such things as the norm, that workarounds are par for the course for a jobbing techie. But complexity of the solution should be proportional to the complexity of the problem, and rendering data in templates shouldn’t be complex unless the data is complex. Also, one of the reasons behind the strategic decision to keep the development in-house was to ensure quality remains high throughout the software stack — we can resolve problems at source, not just where we have access.

In the case of complicated Velocity templates I was at pains to emphasise to our client-side team that they should expect simple, intuitive, relevant information from the back-end systems, and they should use this to ensure their templates are easy to read — if they weren’t getting this then they should tell the back-end developers. Conversely the back-end developers considered it their duty to provide simple, intuitive, relevant information to the client-side team, but needed to be guided as to exactly what this meant.

Consequently, subtle changes in the way information was presented, or accessed, made big differences. The back-end team ensured not only that the domain model was adjusted and expanded to ensure simple templates, they also built a number of tools to ensure the work of templating itself was more manageable.

In the end the Velocity behind the Guardian Unlimited front does remains complex. But it’s complex only as far as the visual design itself is complex. That feels appropriate to me.

Anatomy of CSS rule, thanks to josephtateCSS: Cascading scalability snafus

Because there’s an awful lot of content on the GU front there’s an awful lot of styling. And that’s brought with it its own problems. An Agile process encourages change, and as the work came together the design and the styling necessarily evolved. However, while change can be handled easily in software (thank you, JUnit and Eclipse) that’s not really the case with CSS — there’s no reliable refactoring tool that I know of.

I’ve previously found Factor CSS which reduced a 300 line file I gave it to 40 lines. But as one of our client-side team stressed to me, “Refactoring CSS is about making it more manageable, not shorter”. IntelliJ IDEA does better. It allows renaming of IDs, class names, etc, which is a huge jump forward, but far from the current state of refactoring in object-oriented programming. So refactoring CSS is important to us because of that manageability: this is software we’re writing for the long term.

CSS editing with IntelliJ IDEAThus a short time ago we embarked on a manual refactoring exercise which rounded up all those tiny changes that had crept in throughout the Agile design process. When we began we had a single global CSS file of several thousand lines. That horrified me, though I suspect it wouldn’t be shocking to those who work on this kind of thing full time. There is debate within the team as to whether this is to be expected — “CSS doesn’t scale,” / “Yes it does, but we’re still on the learning curve”.

Either way, we’ve now split it out across several more manageable files and removed a lot of duplication and redundancy. No-one thinks we’ve achieved an ideal, but we think we’ve done what’s pragmatic for now. The team has achieved this partly through dogged hard work, and partly with a bit of innovation. The hard work needs little explanation — it’s work, and it’s hard, and it’s part of any job, although in this case it does have the virtue of rewarding you with incremental results at every step of the way. The innovation is in the way we link CSS templates to the on-page components that they refer to, so that each component pulls in its own CSS if necessary, and only if necessary. This ensures we can isolate component-specific CSS quickly and effectively, and therefore be more confident in our refactoring.

However, I’m left with the feeling that web design is a good way behind back-end software development when it comes to management.

Domain Driven Design book coverSeparating presentation and content

Of course, separation of presentation and content is second nature today, if rare in older systems. I noted with interest that when Yahoo! News UK relaunched recently one of the innovations listed was “proper separation of presentation and content for the first time”. Yahoo! News UK and GU clearly have more in common than just what’s on the surface.

But while the fact of presention and content separation is well-established, the two sides must communicate and how this is done is much less certain. To give a particular example, when we display an article the content consists of obvious things such as article body, headline, byline, etc. Our early adherence to domain driven design (DDD) led us to ensure that this content — and only this content — was visible to the Velocity templates. This sounds fine in theory, but in practice wasn’t good enough: the visual design of the page also depended on knowing non-domain information such as where the 200-word point was, or the dimensions of the associated image.

These things are part of the presentation logic, so need to be kept separate from the domain logic. On the other hand, they also need to be kept separate from Velocity, because that’s a templating framework, not a programming framework.

For these reasons passing in “fake” articles with extra (non-domain) information violated the architectural principles of DDD and lost us its advantages. We also flirted briefly with creating and exposing objects such a DecoratedArticle (using the decorator pattern) but this didn’t pass the intuition test. A DecoratedArticle is not an intuitive thing to anyone except an experienced software developer.

It took some time to evolve a solution that ticked the boxes for domain driven design, simplicity, intuitiveness and manageability. It involved small alterations in our templating patterns, among other things.

The GU front in retrospect

The work to date on the GU front has been far from trivial — not only because of the issues above, but also because of things I’ve not talked about such as performance and the software our editors need to keep updating it.

Along the way we’ve tackled new and difficult technical issues. You might think it’s odd that such issues weren’t resolved last year when we were working on the Travel site which is really part of the same system. But, as I said before, the GU front is probably the single most complex thing we could release. It’s only when you force yourself to confront the most difficult issues that you are forced to see root causes and address them at source. This is particularly so as there is much more work to come. As Emily says, “our view is that for a more technically robust site, which will be a platform for far more rapid developments in the future, an iterative approach is the best”. So we’ve made a good decision to face the difficult stuff sooner rather than later. There will be more technical challenges to tackle in the months ahead. But for now I can say we’ve resolved some really core issues in ways that should hold us in good stead for a long time to come.