Is it possible to estimate a software project well if it’s over, say, a month in duration? Or can you only give meaningful estimates to work that’s less than a … Continue reading Meaningful estimation
Lots of questions went unanswered in my last piece on putting ranges of projections into project proposals and I want to follow up on one of them here: What if our range of values isn’t something we’re happy with?
In my previous example our project had a revenue projection of £150k to £500k. But suppose the project was expected to cost £200k. Then that range encompasses both profit and loss. What can we do?
1. Collect more data
We may be very confident of our potential revenue, but perhaps we could be more confident. More research data should change our perception of what might occur, and so narrow the range of expectations. This might include customer research or research into other companies’ experiences.
Of course our findings may just serve to confuse the issue: we may collect six examples of other companies’ experiences only to find they differ wildly. In such cases we should be more discerning about how we see the data. For example it would be a good idea to learn more about each example and see which are closest to our own proposal.
2. Break down the problem
Our revenue projections (or other measurement) are likely to be a factor of several other elements. If we can break down the problem we can perhaps analyse those elements more easily. With more confidence over simpler measures we ought to be able to make tighter estimations of the whole.
For example, revenue may be seen as a factor of market size, typical customer spend, product visibility and ability of the product to meet customers’ needs. There may be market data readily available on market size and customer spend; we should be in a position to gauge product visibility if that’s a result of our own marketing; and focus groups or similar may be able to help us understand how the product is seen to meet customers’ needs.
3. Move the breakeven point
If we’re worried that we may generate £150k to £500k from a product or project that costs us £200k then maybe we should seek to reduce the costs. Of course if we reduce functionality or quality then there is likely to be some detrimental effect to the revenue. But we may be able to find high cost/low value items which change the game sufficiently.
4. Change the factors that influence the range
It may be that we can reduce uncertainty by actually influencing some of the critical elements. If it transpires that customers don’t see the product as meeting their needs sufficiently then we should change that. That may be by changing the product, or by improving our marketing.
Or we may find it’s useful to do the opposite of our previous suggestion of reducing cost: increasing quality (and adding a bit of cost) may pay extra dividends by increasing take-up. This kind of thing can be seen in places like the Apple App Store. I am at least twice as likely (probably four to five times more likely) to buy an app rated 4.5 stars than if it was 3.5 stars. Yet the difference in production cost to get to that level would have been far less than two-fold.
I’m sure there are more ways to respond to an insufficiently compelling range of possibilities in a project proposal. Overall, though, all these actions have the same beneficial effect I pointed to before: the relevant issues are surfaced so that they can be addressed appropriately and we increase the likelihood of our project’s success. And they are much more likely to be surfaced an addressed if we also surface that fact there is a range of outcomes, rather than project a single outcome.
Reading Doug Hubbard’s excellent How to Measure Anything I got thinking about the proposals I read and write, the problems they face, and how to fix them.
A typical proposal (I’m thinking mainly of internal projects) will make a claim like “This project will add 500,000 unique users in the first year” or “This will bring in £400,000 of new revenue by October” or “This will save £250,000 next financial year”. I know because I’ve written some of them. In choosing what number to use you don’t want to put in anything too low or the project won’t look worthwhile, so you tend to put in something that’s a little generous (but not implausibly so) and assume that all the contributing factors will work out for the best — the marketing, the timing, the user take-up, and so on. Depending on the audience you’re presenting it to you might even have some evidence for your projection.
The problem is, of course, that it won’t happen like that. Some things will work out well, some not so well. Everyone knows that, and if you’re facing a sceptical inquisitor then you’re just setting yourself up with less credibility before you start. Thinking back on the last such proposal I wrote I can imagine a conversation with a fantasy inquisitor that might have gone like this…
Fantasy inquisitor: You say this will bring in £400,000 before October.
Me: That’s right.
FI: Can you guarantee that? Can you guarantee £400,000 of new revenue?
Me: [fearing liability, and wondering which members of my family might get taken away if it doesn’t happen]: No, of course I can’t absolutely guarantee it.
FI: So what are you saying about the £400,000 if you can’t guarantee it?
What indeed? The truth is that will be the revenue if things generally go well. We might make even more. But realistically we might make less. Being honest with ourselves and our audience will help everyone. Let’s return to the scene…
FI: [repeating for the sake of the drama] So what are you saying about the £400,000 if you can’t guarantee it?
Me: I’m confident we’ll make that.
FI: How confident?
Me: Really quite confident.
FI: So you’re “really quite” confident it will make at least £400,000. And I suppose it could make us as much as… what? £800,000?
Me: Er, no I don’t think it will go that well. Maybe £500,000.
FI: So you’re confident it could make as much as £500,000. And you can be confident it won’t generate a penny less than £400,000, can you?
Me: Well, it certainly could make less, I suppose, if not everything went to plan.
FI: Would you be surprised if it didn’t make us anything at all?
Me: Yes, I would. Even being realistically pessimistic it really should make at least £150,000.
FI: Okay, so you’re very confident we’re looking at making £150,000 to £500,000 with this?
Me: Yes, I’m very confident of £150,000 to £500,000 of new revenue.
So our single (and therefore implausible-sounding) figure becomes much more realistic when we change it into a range, even if it goes down at one end. It also goes up at the other end.
I think this goes a long way in dealing with a sceptical audience becaues it is more realistic. For the same reason we should be more confident when we talk about it, too.
I’ll add two final notes. First, there’s an important question that needs to be asked, which I’ve omitted: What is your evidence for this? This may be calculations based on observations, historical data from similar projects in the past, market research, etc. Either way, it’s important to know that the figure(s) in question aren’t entirely random.
Second, the statisticians in the audience will point out that “very confident” is almost meaningless when we could say “90% confident” or attach some other actual number. I certainly appreciate that, and mostly concur. But for me a significant, easy, and meaningful step forward is to move from single figures to more realistic ranges. And this should promote more honest and confident dialogue.
Today I attended the launch of something weird and wonderful: a new musical instrument, the Eigenharp. And although this is a hardware device the event, and the run up to it, brought to mind the launch of our own Open Platform six months ago. By seeing some commonalities between the two it gave me a whole new respect for the people who do marketing and PR, because it reminded me how much stuff needs to be planned. Common phases I saw are: (1) the controlled buzz, (2) openness at launch, and (3) the follow-through. But first a few words about the two products…
The Eigenharp and the Open Platform
The Eigenharp is a new kind of device. It’s a musical instrument with several patents already filed, and incorporates 132 keys with three-colour LEDs and a breath pipe. The keys operate in three dimensions (so you can exploit, say, pressure, pitch bend and filter effects with one finger) and are sensitive to within one micron. So while it’s compatible with MIDI, that’s an incredibly out-of-date protocol compared to what you could do with it.
The Open Platform has been well-documented here and elsewhere: a full content API directly into the Guardian’s content database, plus a store of raw data on which our journalists base some of their stories.
Although they are clearly two different kinds of beasts a remarkable similarity struck me in that they are both not just new products, but really new products. There is no clearly defined place for them in either of their markets, and no predictable success path for either one. The success of each is dependent entirely on other people’s innovation. The Eigenharp’s success is dependent on musicians taking their creativity into new directions — no-one to date has written music for an Eigenharp. The Open Platform’s success is dependent on developers doing innovative things with the new data they have access to.
So how do you launch a product that’s really new? The similar steps I can see are…
Eigenlabs (the creators of the Eigenharp) and the Guardian (creators of the Open Platform) managed to seed a few select people with their product — musicians and developers respectively — and some small details leaked out. This clearly did a couple of things.
First, it helps you get the product right. I know from my conversations around the Open Platform that what might seem a good idea to one or two people close to the product can provoke strong negative reactions when you float it past an informed outsider. When you’re inside the company you can easily think too much and lose the big picture, so it’s good to get an informed but independent view before you go public.
Second, creating a buzz protects you from a bit of unforeseen negative reaction — if people are excited about your product and want it to succeed they will forgive some minor mistakes. You can see Eigenlabs achieved this with this low-fi video of two guys playing the James Bond/Moby theme on YouTube. Eigenlabs didn’t create this themselves, but they let “close friends” do it, and you can see the reaction in the comments underneath: “What the hell!! This is awesome”, “where can I buy one of these?”, “dear god this is epic” and so on.
The benefits of this are well explained by Lance Ulanoff at PCMag.com. Here he compares the lacklustre launch of the Motorola Cliq with the hype from the January 2009 announcement (six months before the launch) of the Palm Pre:
Speaking of Palm, it has a good bit in common with Motorola right now. The Pre (and now Pixi) is its hail-Mary pass. If the new phone and WebOS platform fail, Palm will be done. Back in January, however, Palm ran the best product rollout event I have ever attended. It perfectly conveyed the company’s excitement and all that is good about the Pre. I remember tweeting the event with mounting excitement. Today, I tweeted the Motorola event with mounting confusion.
What Palm did in January was give the Pre a good hard shove off the shore. Those waves of excitement produced a good six months of positive press. Eventually, Palm and the Pre hit some rough waters—a shipping delay, the slow delivery of the SDK and tiny list of apps that has yet to grow. Still, that first day set the tone.
Then after the buzz is…
There’s much that doesn’t need to be said about the launch event, but one thing that struck me about both the launch of the Eigenharp and the launch of the Open Platform was a common ethos: the honesty and openness of the presenters, and the openness of the products.
At both events the “real users” (musicians/developers) were on-stage demonstrating their early adoption of the product and open to questions from the audience. These are the kinds of people who most marketing and PR people would lock in cupboards at a press event for fear they arrive without the regulation press-on smile and go off-message. But when a product’s success relies on innovation and creativity from its real users then it’s important that intended audience hears an authentic voice.
Also, there is an openness about both products. Aside from the very concept of opening up content and data, the Open Platform gives a lot of latitude to developers, including (unusually) commercial use. Meanwhile Eigenlabs are open-sourcing their software. Again, because each product’s success relies on people taking it in unexpected directions it’s important for there to be as many opportunities as possible for that to happen. Otherwise its success is stifled.
And then finally there’s…
3. The follow-through
Well, it’s early days for both products. The Open Platform API is still in beta, but you can see some developments now, such as the launch last week of the Applications Gallery. It’s even earlier for the Eigenharp. Company founder John Lambert was asked what well-known musicians he’d like to see using his creation and he said that a number of high profile people do have the instrument but that he cannot name any names yet. Clearly some really interesting things are going to happen some time soon.
If a product really does depend on the creativity of others then you can’t — and mustn’t — be too controlling of what those people do with it. (When I spoke to him John seemed slightly apprehensive about the quality of some of the things people might produce around the Eigenharp; I bet Matt McAlister didn’t anticipate a swearword tracker coming out of the Open Platform.) But if you can’t control what others do, you can at least show some examples of what might be achieved, and that’s what the follow-through is doing with each of these products.
1-2-3, easy as A-B-C
Now if any marketing people have stumbled across this blog post I wouldn’t be surprised if they were horrified by the naivety of these observations — all of this might just be first grade A-B-C of marketing for them. But I find this fascinating. As a tech person it’s very easy for me to focus on my own role and not spend much time wondering what challenges are faced by my colleagues in other departments. Here are two products whose paths to success are unusually dependent on the unknown and whose stories will be well worth watching. By looking at the communication that’s gone on around them the significance of other people’s role becomes much more apparent. Technology success is about much more than successful technology.
…which are important even on an Agile project.
Many people who read just a little about Agile development think there are no fixed commitments. It’s true that there is constant reprioritisation of work, but that generally operates at the task level, and there is still a need to set goals and stick to them. After all, how else can you give the people who sign the cheques reassurance that you’ll deliver what they want?
So when we planned R2 we also set milestones marking when we would make the major launches of each section. Each milestone was stated very simply at the high level: launch the homepage, launch the Environment section, etc. These were the targets we put before our senior stakeholders, and they were what we were accountable for.
At a lower level we planned for each launch to include specific features: the Culture launch would include a special component to list the critics; the Sports launch would include a component showing the latest betting odds from Blue Square; and so on. It was these lower level details which were always up for reprioritisation. As long as we could deliver sufficient features to allow the launch of each section we could fairly be said to be hitting that milestone and satisfying our senior stakeholders.
At the weekend the Guardian website went through one of the most significant transformations in its history: we moved our news, politics and Observer content into the new design and new content management system, and we simultaneously launched a lot of new functionality, both internal and external.
There’s an introduction and discussion on the more public-facing aspects of this, kicked off by Emily Bell. For my part, I want to talk briefly about one of the most remarkable behind-the-scenes aspects of it: how we got the weekend launch to go so incredibly smoothly.
The secret is that the weekend’s work was only the final step after a great many before it, all of which were safely out of the way before the weekend…
1. Software release
The actual software was released some weeks ago, in early January. This means that by the time of the launch it had been in use for some time, almost all the lines of code having been executed several hundred (and in some cases thousand) times already, in the production environment.
Even that January release was only an enhancement of previous releases which have been going out fairly quietly over the previous few months. The latest one included new internal tools, and updates to tools, to support some of the new features that are visible today.
2. Building the pages
Meanwhile editors, subeditors and production staff have had to learn to use those tools. They’ve been using them to migrate a lot of the older content into the new-format pages. You might think that could be done by machine, but that’s not the case. Since we’re also changing our information architecture — adding a lot of semantic structure where previously there was only presentational information — it takes an informed human being to know how to convert an older page into its newer format. A real person is needed to understand what a piece of content is really about, and what it isn’t about but merely mentions in passing. We also need people to look at the new keyword pages (for example, the one on immigration and asylum, or the page on Kosovo), which are populated mostly automatically, and for them to highlight the most important content: the most important content won’t necessarily be the newest.
This work had been going on for many weeks before the weekend launch. The January software release brought in some tools refinements to help them tidy up final loose ends (and no doubt some more tidying will happen over the next couple of weeks).
You can see from this that it’s about much more than mere software. The software enables people to do the editorial work, and the editorial work has been going on for some considerable time. Everything they’ve done has also been tested and previewed, which allows them to see what our external users would see were it to go live. Again, this exercises the software fully, but only within the internal network, before it’s exposed to the outside world.
The work for the weekend launch is mainly running a lot of database scripts to make various new URLs live and decommission old ones. The reason this is such a huge launch is that there’s over ten years’ worth of news content to expose, as well as new functionality and designs.
We couldn’t trust the scripts to work first time (of course), so we spent a lot of time copying the production content into a safe environment, and rehearsing the process there, with real data. We needed to be sure not just it would work, but also how long it would take (considering any hardware differences), and change the scripts and process accordingly.
Finally, after all the rehearsals, the real deal. The work to run the database scripts and raise the curtain on various features ran over Saturday and Sunday, but it was calm enough and organised enough that the team needed to work only slightly extended working days.
So the big launch was a culmination of a huge amount of effort, and the weekend work was after an awful lot of practice. There were a couple of sticky moments, but nothing the team couldn’t recover from in a few minutes. As one of the developers remarked towards the end of Sunday: “The fact that today’s been really tedious is a good thing.”
What we can see now includes
- New navigation at the top of most pages;
- UK and world news sections;
- A new layout for the Observer homepage;
- A new politics site;
- An audio homepage…
- …with each piece of audio content identified as an individual resource with its own URL;
- An A-Z index of subjects…
- …and contributors;
- More meaningful related links down the side of articles;
- An “article history” link for every article in the new format, in the name of transparency;
- Page-by-page and section-by-section navigation around the latest edition of the Guardian and Observer;
- Lots of new layouts for editors to choose from, like this and this and this;
- And a new favicon.
We’ll be ironing out a few things over the next few days, but everything’s gone to plan for now. And then, as Emily says, there’s still sport, arts, life and style, and education to do.
Late last night (UK time) the Times Online launched their new design, and jolly nice it is, too. It’s clean and spacious, and there’s an interview with the designers who are very excited about the introduction of lime green into the logo. Personally, I like the columnists’ pictures beside their pull-quotes. That’s a nice touch. You can also read about it from the Editor in Chief, Anne Spackman.
However, not everyone at the website will have been downing champagne as the moment the new site went live, because in the first few hours it was clearly having some performance issues. We’ve all had those problems some time in our careers (it’s what employers call “experience”), and it’s never pleasant. As I write it looks as though the Times Online might be getting back to normal, and no doubt by the time you read this the problems will all be ancient history. So while we give their techies some breathing space to get on with their jobs, here are three reasons why making performant software is not as easy as non-techies might think…
1. Software design
A lot of scalability solutions just have to be built in from the start. But you can’t do that unless you know what the bottlenecks are going to be. And you won’t know what the bottlenecks are going to be until you’ve built the software and tested it. So the best you can do from the start is make some good judgements and decide how you’d like to scale up the system if needed.
Broadly speaking you can talk of “horizontal scaling” and “vertical scaling”. The latter is when you can scale by keeping the same boxes but beefing them up — giving them more memory/CPU/etc. The former is where you can scale by keeping the same boxes end-to-end, but add more alongside them. Applications are usually designed for one or the other (deliberately or not) and it’s good to know which before you start.
Vertical scaling seems like an obvious solution generally, but if you’re at the limit of what your hardware will allow then you’re at the limit of your scalability. Meanwhile a lot has been made of Google’s MapReduce algorithm which explicitly allowed parallelisation for the places it was applied — it allowed horizontal scaling, adding more machines. That’s very smart, but they’ll have needed to apply that up-front — retrofitting it would be very difficult.
You can also talk about scaling on a more logical level. For example, sometimes an application would do well to split into two distinct parts (keeping its data store separate from its logic, say) but if that was never considered when the application was built then it will be too late once the thing has been build — there will be too many inter-dependencies to untangle.
That can even happen on a smaller scale. It’s a cliche that every programming problem can be solved with one more level of indirection, but you can’t build in arbitrary levels of indirection at every available point “just in case”. At Guardian Unlimited we make good use of the Spring framework and its system of Inversion of Control. It gives us more flexibility over our application layering, and one of our developers recently demonstrated to me a very elegant solution to one particular scaling problem using minimally-invasive code precisely because we’d adopted that IoC strategy — effectively another level of indirection. Unfortunately we can’t expect such opportunities every time.
2. Devising the tests
Before performance testing, you’ve got to know what you’re actually testing. Saying “the performance of site” is too broad. There’s likely to be a world of difference between:
- Testing the code that pulls an article out of the database;
- Testing the same code for 10,000 different articles in two minutes;
- Testing 10,000 requests to the application server;
- Testing 10,000 requests to the application server via the web server;
- Testing the delivery of a page which includes many inter-linked components.
Even testing one thing is not enough. It’s no good testing the front page of the website and then testing an article page, because in reality requests come simultaneously to the front page and many article pages. It’s all very well testing whether they can work smoothly alone — it’s whether they work together in practice that counts. This is integration testing. And in practice many, many combinations of things happen together in an integrated system. You’ve got to make a call on what will give the best indicators in the time available.
Let me give two examples of integration testing from Guardian Unlimited. Kevin Gould’s article on eating in Barcelona is very easy to extract from the database — ask for the ID and get the content. But have a look down the side of the page and you’ll see a little slot that shows the tags associated with the article. In this case it’s about budget travel, Barcelona, and so on. That information is relatively expensive to generate. It involves cross referencing data about the article with data about our tags. So testing the article is fine, but only if we test it with the tags (and all the other things on the page) will we get half an idea about the performance in practice.
A second example takes us further in this direction. Sometimes you’ve got to test different operations together. When we were testing one version of the page generation sub-system internally we discovered that it slowed down considerably when journalists tried to launch their content. There was an interaction between reading from the database, updating the cache, and changing the content within the database. This problem was caught and resolved before anything went live, but we wouldn’t have found that if we hadn’t spent time dry-running the system with real people doing real work, and allowing time for corrections.
3. Scaling down the production environment
Once you’ve devised the tests, you’ll want to run them. Since performance testing is all about having adequate resources (CPU, memory, bandwidth, etc) then you really should run your tests in an exact replica of the production environment, because that’s the only environment which can show you how those resources work together. However, this is obviously going to be very expensive, and for all but the most cash-rich of organisations prohibitively so.
So you’ll want to make a scaled down version of the production environment. But that has its problems. Suppose your production environment has four web servers with two CPUs and six application servers with 2GB of RAM each. What’s the best way to cut that down? Cutting it down by a half might be okay, but if that’s still too costly then cutting it further is tricky. One and half application servers? Two application servers with different amounts of RAM or CPU?
None of these options will be a real “system in miniature”, so you’re going to have to compromise somewhere. It’s a very difficult game to play, and a lot of the time you’ve got to play to probabilities and judgement calls.
And that’s only part of it
So next time you fume at one of your favourite websites going slow, by all means delve deep into your dictionary of expletives, but do also bear in mind that producing a performant system is not as easy as you might think. Not only does it encompass all the judgement calls and hard thinking above (and usually judgement calls under pressure), but it also includes a whole lot of really low-level knowledge both from software developers and systems administrators. And then, finally, be thankful you’re not the one who has to fix it. To those people we are truly grateful.
There’s a post from Joel O’Software regarding measuring performance and productivity. He’s saying some good stuff about how these metrics don’t work, but I’d like to balance it with a few further words in favour of metrics generally. Individual productivity metrics don’t work, but some metrics are still useful, including team metrics which you might class as productivity-related.
- Individual productivity metrics don’t work.
- Productivity-like metrics are still useful…
- …but they don’t tell the whole story
Individual productivity metrics don’t work
Joel O’S states that if you incentivise people by any system, there are always opportunities to game that system. My own experience here is in a previous company where we incentivised developers by how much client-billable time they clocked up. Unfortunately it meant that the developers flatly refused to do any work on our internal systems. We developed internal account codes to deal with that, but it just meant that our incentivisation scheme was broken as a result. Joel has other examples, and Martin Fowler discusses the topic similarly.
Productivity-like metrics are still useful…
Agile development people measure something called “velocity”. It measures the amount of work delivered in an iteration, and as such might be called a measurement of productivity. But there are a couple of crucial differences to measuring things such as lines of code, or function points:
- Velocity is a measurement of the team, not an individual.
- It’s used for future capacity planning, not rewarding past progress.
Velocity can also be used in combination with looking at future deadlines to produce burndown charts and so allow you to make tactical adjustments accordingly. Furthermore, a dip in any of these numbers can highlight that something’s going a bit wrong and some action needs to be taken. But that tells you something about the process, not the individuals.
The kick-off point for Joel’s most recent essay on the subject is a buzzword-ridden (and just clunkily-worded) cold-call e-mail from a consultant:
Our team is conducting a benchmarking effort to gather an outside-in view on development performance metrics and best practice approaches to issues of process and organization from companies involved in a variety of software development (and systems integration).
It’s a trifle unfair to criticise the consultant for looking at performance metrics, but one has to be careful about what they’re going to be used for.
…but they don’t tell the whole story
A confession. We track one particular metric here in the development team at Guardian Unlimited. And a few days ago we recorded our worst ever figure for this metric since we started measuring it. You could say we had our least productive month ever. You could. But were my management peers in GU unhappy? Was there retribution? No, there was not. In fact there was much popping of champagne corks here, because we all understand that progress isn’t measured by single numbers alone. The celebrations were due to the fact that, with great effort from writers, subs, commercial staff, sponsors, strategists and other technologists we had just launched
- a refreshed and reorganised Arts site,
- a brand new Arts blog,
- our first ever dedicated Music site
- with its own Music blog,
- and an exclusive video,
- and we still made time to help our colleagues over at Guardian Weekly launch the new Guardian Abroad site.
A bad month then? Not by a long shot. The numbers do tell us something. They tell me there was a lot of unexpected last-minute running around, and I’ve no doubt we can do things better the next time. It’s something I’ve got to address. But let’s not flog ourselves too much over it — success is about more than just numbers.
Technical people don’t tend to like a big bang release, but it’s often perceived as the only viable option when you launch a new service. Why is this, and is there any kind of middle ground? The recent redesign of Telegraph.co.uk is a useful way to look at this. I’m also going to use examples from Guardian Unlimited, most notably Comment is Free.
What is a big bang release?
A big bang release is one in which a large system is released all in one go. Recent examples from Guardian Unlimited include the Comment is Free site and our daily podcasts. This week the Telegraph launched its website redesign. In the offline world, UK newspapers have been in a state of almost constant change recently, kicked off largely by the Independent’s sudden transformation from a broadsheet to a tabloid in September 2003. The Times followed suit shortly after, and in September 2005 The Guardian changed to the “Berliner” size, new to any British paper. These launches can only be done in one go: you can’t have a newspaper that’s broadsheet on page 1 and tabloid on page 2.
Contrast that to…
At the opposite end of the spectrum to the big bang is the mantra “release early, release often”. This is how the Linux community claims much of its success. Much of GU’s work is done this way: while there’s always a project in the pipeline like Comment is Free, most of the time we’re making small improvements and adding new features day by day. Some of these are internal to our colleagues, some are bugfixes, but many are small features added to the public site. In January the Weekend magazine ran a photo competition with Windows XP. We worked with them to produce an online entry form that allowed our readers to upload their photographic entries directly to us.
So what are the worries about big bangs?
Big bangs give you a jump on your competitors and give you a big marketing boost. But they also put an awful lot at stake. Releasing a photo uploader is one thing. That’s more of a feature release than a product release. Consider what makes up Comment is Free, which the GU team created in collaboration with Ben Hammersley. In no particular order:
- Steve Bell’s If… cartoon every day;
- Comments on blog entries;
- Keywords applied to each entry;
- Commenters need to sign in (or register) before they can comment;
- An RSS feed for the whole blog, plus one for each contributor;
- A one-click feature for editors behind the scenes to “push” an article from the Guardian or the Observer into Comment is Free.
- Dan Chung’s photo blog;
- A list of the most active entries on the blog…
- …and the most interesting from elsewhere;
- A search facility;
- The ability to serve all this even with the inevitable battering of thousands of readers check it out every day.
And that’s not counting the editorial team powering it and the work of the designers creating something is beautiful and works across a good range of web browsers (including the famously fickle Internet Explorer).
Using the example of Comment is Free there are several kinds of difficulties with big bangs. One is that there is a lot to potentially go wrong. Another other is that there may be a big switch-over which is difficult to engineer and risky to run. A third is relates to community reaction and fourth relates to social psychology.
Difficulty 1: The lots-to-go-wrong scenario
Within hours of launch, any of the Comment is Free features above (and any number of others which I forgot) could have failed in unpredictable but surprisingly common circumstances. The important thing to remember is that this is the first time any of these features would be tested by unknown people, and the first time they’re used together in unpredictable ways. A problem with a core component of the system can have an impact on many aspects visible to the readers. A big bang can’t be rolled back without a lot of egg on face.
The problem is much more acute with anything that makes a fundamental change to a core part of a system. In any one hour we’ll be serving hundreds of thousands of pages, some of which were published five minutes ago, some of which were published five years ago. A change to the way we present articles is going to have consequences for thousands of people in myriad different situations over the course of a few minutes. If we got something wrong there, misjudged some of those situations, we could annoy hundreds or thousands of them very quickly. And since such problems can take time to fix, more people will be annoyed the longer it takes.
All but the smallest of software launches throw up some problems. Often they are too minor to worry too much about (a particular colour change not working in a particular browser, for instance). Sometimes they are superseded by more significant problems in the same launch. With a small feature release (early and often) we can spend a little time and fix it. But if a sweeping change has occurred there’s the danger of having too many things to fix in a timely fashion. Even the original big bang took several millenia to settle down, although to be fair that was mostly a hardware rollout.
Difficulty 2: The big-switch-over scenario
With some systems it’s not a case of making a big thing public in one shot; sometimes it’s a major internal mechanism that has to be switched over in one go.
Adding comments to articles might involve not just mean an addition to the underlying system, but a wholesale either/or change. The system must work not just before and after the change, but during the change, too. And everything that links into the part which changes must be somehow unlinked and relinked in good time.
If you think this language (part, unlink, etc) is vague, by the way, there’s a reason for that: exactly what needs to be changed will be many and varied. Database tables, triggers, stored procedures, software libraries, application code, outgoing feeds, incoming feeds, application servers, web servers… all these might need “unlinking” (dropping, reinstalling, stopping, bouncing, whatever). It needs careful planning, co-ordinating, and execution, and it might only be possible over several minutes, which is a very long time if you’re running a live service.
Again, rehearsing this and doing it for real in the live environment are very different things. But this time if something goes wrong you’re stuck with a broken service or something which will take just as long to roll back — which will only leave you back at square one.
Difficulty 3: Community reaction
Even if everything goes well on a technical level there’s still the danger that your user community won’t like it.
A failure for Comment is Free, Emily Bell has written, would have been if no readers had posted any comments at all. That would have been an awful lot of work wasted. But at least in that scenario we could have quietly shut it down and walked away.
A more difficult scenario would have been if people were using it, but found a fundamental feature annoying or obsctructive. A good example of this is the page width on Comment is Free. 99% of Guardian Unlimited pages are designed to fit a monitor which is 800 pixels wide. That was pretty standard six years ago when the site was designed, but is less common now. Both Comment is Free and the Telegraph’s new design require a width of at least 1000 pixels. That gives much greater design freedom and hence a better user experience — unless you can’t or won’t stretch your browser that wide, in which case it could be very alienating. If readers react badly to the 1000 pixel design then there’s very little that can be done about it. Sure you can redesign, but that’s a major undertaking.
Difficulty 4: Psychology
So, let’s suppose the features all worked, the switch-over all worked, and the community doesn’t react negatively. There’s still the problem that people, well, they just might not gel with it.
Blogs are a very good example of this. They’ve been around for almost a decade, and talk/discussion software has been around for much longer. But blogs only really took off in the last two or three years. Most of this is entirely down to getting the psychology right:
- A blog is now recognised as a largely personal affair, being the soap box for individuals rather than a free-for-all forum;
- discussion tends to be just a single column of comments rather than threaded, ensuring commenters respond more to the original poster, rather than getting into an internal discussion;
- most bloggers write less content but write it more often.
That list is not something that falls straight out of a grand vision. It’s a series of very subtle psychological and social lessons that the whole online community learned over a very long period. Similarly any software release relies on social and psychological subtleties that can’t always be guessed at, and will often overlap each other. Making a single big release will obscure any of these subtleties. If people aren’t reacting to your product in quite the way you hoped then working out why could be difficult: a small tweak to a couple of presentational elements might make all the difference in the world… or perhaps your promotion of the product just wasn’t enough.
How can we mitigate this?
From the outside it looks like Telegraph’s website redesign was executed — is being executed as I write — without breaking a sweat (though I daresay internally there were a lot of frayed nerves as well as excitement). It gives us a few clues about how to manage a big bang.
Mitigation: Splash instead of bang
One trick is to make a big splash, rather than a big bang. The website’s redesign is actually rolling out over days and weeks, not minutes or hours. The team made sure to change the high-profile front page and World Cup pages first, so ensuring the perceived impact was greatest. They get a lot of credit without risking more than they need to.
Mitigation: Step-by-step approach
Even if plaudits is not what you’re after it can be useful to slow big bangs into small, distinct steps.
A while ago at GU we need to switch from one kind of server to another. On the face of it there was no way to smooth this switch: requests either went to one server, or they went to another. Nevertheless, any kind of problem switching over could have been potentially disastrous. We took extra time to build in a middle tier, which brokered incoming requests to the back-end servers, and added into that a facility to route different classes of requests to one or other of the servers. Initially everything went to the one server. Then we altered it so that only one specific kind of request, a tiny fraction of the whole, was routed to the other. We ensured that was working smoothly before we sent another class of requests over, and so on. We took what would originally have been a dangerous all-or-nothing switch, and smoothed it out over several weeks.
Mitigation: Avoid feature creep
Another thing the Telegraph team have done is to avoid feature creep. The redesign comes with almost no new features — I can count the ticker, but the rest seem to be rearrangements of the same material. This contrasts (unfairly, but instructively) with, say, the Guardian’s Berliner reformatting where the editor took a conscious decision not just to change the size of the paper, but also the design, the section arrangements, the approach to presenting the writing (distinguishing news and comment more distinctly, referencing the online world more) and numerous internal processes.
Avoiding feature creep on software projects is very, very difficult, and it takes strong wills and authority to say No to the many voices who want just a little bit more.
Mitigation: Don’t just test — rehearse
Testing individual features is one thing: testing the whole lot together is another. That’s why rehearsing the whole sequence together is important. And there’s no point rehearsing if you’ve not built in time to make changes when (not if) you discover there are flaws in your plan.
Mitigation: Plan for natural breaks
The Telegraph also seem to have a release plan with plenty of natural breaks: once started it seems they weren’t on a time-critical unstoppable slide. Of course they will have had deadlines, but if they had to slip some of these for the sake of something else then that was always a possibility.
A good example of this is their replacement of the missing e-mail and print features. The site’s articles originally had links to mail them to friends or print them out. The original rollout omitted these features, and there was a promise that they would be restored later. However the outcry from the community was such that they interrupted their schedule and reinserted them earlier than planned. It’s a good example of listening to your community, but it also shows that you need to the flexibility to react accordingly.
Mitigation: The safe but indirect route
Big-switch-over-scenarios can be avoided by ensuring you take a route that’s not necessarily direct but is at least safe. You may end up with a very long and winding plan, however. This is a bit like the word game where you have to go from one word to another, changing one letter at a time, but still ensuring that you’ve got a word at every step of the way.
This can still give you your big bang if the final step is the only visible step. The Telegraph redesign will have involved an awful lot more behind the scenes than the apparently straightforward change first seen by the public. No doubt they could have done it quicker at the expense of a more confusing user experience, but they managed to keep the end user change to minimum.
Mitigation: Trial features elsewhere
If you’re unsure of a particular part of a big bang release then it might be possible to trial just that feature elsewhere. As noted above, Comment is Free was the first large scale GU site with a thousand pixel width. It was a calculated risk. Reaction to that will inevitably inform any future services we launch.
Mitigation: Loosely coupled systems
Another mitigation technique to allow a big bang is to ensure the product being released is on an entirely separate system to anything else. If you’ve been following Comment is Free very closely, you’ll have read Ben Hammersley’s comment (you’ll need to scroll down or search for his name) that it’s based on the Movable Type blogging software. MT is certainly not what we use as our main content management system and it’s necessarily separate. This makes the big bang release easier: there is integration, but it’s loosely coupled. The systems are linked, but they’re not tied that closely that a disaster with one will adversely affect the other.
Don’t think it’s solved
All of the above approaches (and the many more I’ve inevitably missed) help, but they aren’t silver bullets. Each one has its drawbacks. A splash might not always be a commercially-acceptable compromise; there may be no obvious step-by-step solution; feature creep relies on more strength than organisations usually possess in the heat of the excitement, and even then it only limits the problems without eliminating them; rehearsing requires time and an expensive duplication of the entire production environment; natural breaks can be difficult to identify, and can rely on a running-order that isn’t compatible with commercial interests; the indirect route is necessarily lengthier and most costly; feature trials are only possible in moderation; loosely coupled systems limit the amount of integration and future development you can do.
The Telegraph has clearly encountered some of these problems.
It’s notable that “Education” disappeared from their navigation bars initially, and was only findable via the search engine, as Shane Richmond recommended on one blog comment to readers. No doubt they couldn’t find a cost-effective route from the old design to the new that encompassed preserving all the navigational elements.
The switch to a wider page was greeted with mixed reactions, and while we limited it to Comment is Free, the Telegraph dived right in and is applying it to everything.
Finally, it’s interesting that the new design is quite sympathetic to their previous one. No doubt a lot of that is due to the Telegraph having the same cores value two years ago as it has today. But it’s some way from some of the more radical ideas they had, and no doubt that is partly a constraint of having to live with two designs for period.
So where does this leave us?
I hope this explains some of the problems with big bangs, some of the ways to deal with them, and the difficulties that still remain.