A software proposal to improve estimation

Previously I’ve talked about estimation, using an example of a project that we were very confident would make a certain amount of revenue. The slight flaw in this approach is that “very confident” varies depending on who’s making the estimate. So here is a proposal for an application to train the user in consistent estimation. I’d love to see this running somewhere and looking pretty. “Improve your estimation” would be a fun game to play (if you’re that way inclined). So let me set out…

  1. The point of it
  2. The no-tech solution (and credit)
  3. The software proposal

The point of it

Sepia Eiffel Tower, by Joanna KiyonéHere are example situations we want to improve in project proposals:

  • “I’m very confident this will bring in $0.5m to $2.5m of new business”
  • “I’m very confident this change will see no less than a 2% drop-off rate in conversions”
  • “I’m very confident the addressable market is between 5m and 9m users globally”

We want to turn those into:

  • “I’m 90% certain this will bring in $0.5m to $2.5m of new business — and you can be sure that when I say 90%, I’m right 90% of the time”
  • “I’m 95% confident this change will see no less than a 2% drop-off rate in conversions — and you can be sure that when I say I’m 95% confident of something I really am right 95% of the time”
  • “I’m 90% sure the addressable market is between 5m and 9m users globally — and when I say I’m 90% sure of something I really am right 90% of the time”

The purpose of the application is to help people really be right 90% of the time when they say they’re 90% certain — or any percentage they want.

Estimating well is important for accuracy. This is particularly true when we consider that targets like revenue and reach are actually factors of many other measures (market size, consumer price sensitivity, economic conditions) which are themselves estimated. With estimation piled on estimation it becomes even more important to move from a nebulous “very confident” to a concrete “90% confident” and so be sure of our bounds of certainty.

The no-tech solution (and credit)

Credit for switching me on to the problem, and for the no-tech solution that follows, is to Doug Hubbard and is part of the narrative of his book How to Measure Anything. What is his approach to getting good at estimating percentage certainty? Have a look at this question:

What is the height of the Eiffel Tower in feet?

I don’t suppose you know… exactly. I certainly don’t. But we can have a good go at estimating a range with 90% certainty. I am absolutely certain that it’s taller than me, so it’s definitely over 6 ft. And I’m virtually certain that a mile is a thousand-and-something feet, and it must surely be less than a mile high, but let’s say it’s less than two miles for safety.

So I’m 99.9% certain it’s between 6 ft and 2,000 ft tall. But that’s too conservative, because we’re aiming for 90% certainty, not 99.9% certainty. So I’ve got start using some other tricks. I’m 6ft 3in, and a house is about six mes tall… call it 40ft. How many houses tall is the Eiffel Tower? Between six and twenty, I reckon. So between 240 ft and 800 ft. That seems better — I can sign up to that range with 90% confidence.

Spinner, by Darrell HamiltonNow let’s keep going. Answer the following questions, each with a range you’re 90% confident of:

  1. What is the height of the Eiffel Tower in feet?
  2. In what year was Anne Boleyn born?
  3. What is the population of Tokyo?
  4. How many square miles is Peru?
  5. How many saints did John Paul II canonise?
  6. How many books in the Mister Men series?
  7. What is the weight of the Liberty Bell, in kilograms?
  8. What is the circumference of the moon, in kilometres?
  9. How many square centimetres is the Mona Lisa?
  10. How many patents did IBM lodge in 1999?

The aim, of course, is to have 90% of your ranges right. This is what Doug Hubbard calls being “calibrated” — i.e. when we say we’re x% sure of something we are indeed correct x% of the time.

Unfortunately it’s not quite as easy as saying we must score 9/10 in the above questions — we’re expected to get 9 out of 10 on average, not necessarily 9 of these particular 10 right. So we refer to statistics, and the binomial distribution says if we want to be 95% sure that we are calibrated then we can expect to get 8, 9 or 10 out of 10. Similarly…

  • 20 questions implies 17-19 should be right at the 90% level.
  • 25 questions implies 21-24 should be right at the 90% level.
  • 30 questions implies 25-29 should be right at the 90% level.

We can calibrate ourselves with true/false questions, too. This time we have add how certain we are of our choice:

The first Cannes Film Festival was held in 1961. True or false? And how sure are you of your answer: 50%? 60%? 70%? 80%? 90%? Or 100%

Saying we are 50% sure means we have no idea — equivalent to the flip of a coin. 100% means we’re absolutely certain. If we answer five questions with certainties of 60%, 60%, 50%, 90% and 80% then we expect on average we’ll get 0.6 + 0.6 + 0.5 + 0.9 + 0.8 = 3.4 out of 5. If we want to be 95% sure that we are indeed calibrated then (if my maths is right) we should expect to get 2, 3, or 4 out of 5. This is clearly too wide a margin to be informative so in reality we should again be answering 20 or 30 questions at a time.

By putting in a feedback loop we can improve. We answer 30 questions, find out if we’re over-cautious or under-cautious in our estimating, then consciously compensate while we try answering another 30 questions. And so on until we start hitting our targets.

Chippewa Country info boxThere are also tricks to helping us judge our understanding of 90% confidence. Again, I’m taking my queue from Doug Hubbard. Take a look at a wheel with a one-tenth slice marked out and a spinnable needle in the centre. If we are 90% confident of something then we should be equally confident that we can risk £5 that a spin of the needle will end up in the larger zone (if it ends up in the smaller slice we lose our money). If we would rather spin the needle then we’re less than 90% confident of our assertion and should widen our range; if we’d prefer to stick with our assertion then we’re more than 90% confident and should narrow our range; it’s only when we think we’d be equally happy with one or the other that we’ve got to 90% confidence.

This seems to be worth trying with each question whenever in doubt.

I think being good at estimating like this would be a cool party trick. Especially if we go to the parties held by our local maths department. Now the only question is: where do we find an endless source of semi-obscure questions? This brings us to…

The software solution

The proposal, then, is for an application which automates this estimation training:

  1. How many questions would you like to answer? 10, 20, 30…? And would you like questions about ranges or true/false questions?
  2. Here are the questions, enter your answers. (If you want help with an answer you can pop up a wheel with a suitably-proportioned slice marked out — do you prefer your answer or prefer to spin the wheel? Now you can rethink your answer in the light of that.)
  3. Feedback: You got this many right, which was/wasn’t what we expected for someone who is a good estimator. You’re over/under/correctly confident so you need to adjust accordingly/congratulate yourself. Now let’s go again…

I imagine this would look nice with now-de-rigueur CSS3, pastel colours and big shiny buttons with rounded corners. And a good source of questions is Wikipedia.

In particular the info box on the right hand side of a many articles is good source of stark facts — for example, the one for Chippewa County, Minnesota. There are some good range-type questions to be extracted from this: total area in square kilometres, population as of 2000, year of founding. There are also some good true/false questions (largest city, date founded) but only if we can find a reasonable-sounding false answers.

That last problem can be solved like this: First, at the bottom of the page find a list in which this subject appears. In this case Chippewa appears in a list of counties. Then find another item in the same list — say, Murray County — and go to that page and find its info box. Now we can compare the same aspects of Chippewa and Murray (largest city, date founded, timezone) with different values (largest city and date founded are different; but avoid timezone as it’s the same for both) and swap them randomly.

Murray County info boxThe list these two subjects were drawn from (Counties), the heading of that list (State of Minnesota) and the aspect selected from the info box (largest city, etc) provide a reasonable way to present the question, since English-language question generation looks a bit tricky. We end up with something like:

  • Answer these questions with 90% confidence:
    • State of Minnesota > Counties > Chippewa County, Minnesota > Area – Total > ……. to …… km2
  • Answer these questions as true or false, and state your confidence:
    • State of Minnesota > Counties > Murray County, Minnesota > Founded > February 20, 1862. True or false? 50%? 60%? 70%? 80%? 90%? 100%?

There are still some flaws with this. You have to go through quite a lot of random Wikipedia pages before you find one with a decent info box. And sometimes whether the info in the info box is true or false is too obvious:

  • State of Minnesota > Counties > Chippewa County, Minnesota > Named for > Chippewa Indians, Chippewa River. True or false? 50%? 60%? 70%? 80%? 90%? 100%?
  • State of Minnesota > Counties > Murray County, Minnesota > Website > http://www.co.chippewa.mn.us. True or false? 50%? 60%? 70%? 80%? 90%? 100%?

Maybe we should restrict the subjects to some known areas (books, US states, composers,…) and not look in the entire pool of Wikipedia articles. And maybe some kind of text matching might help avoid the embarrassingly obvious true/false questions.

Or maybe you can think of a better source of questions.

If so, there’s a very useful service just waiting to be created…

Footnote

Where to find answers to the above questions…

  1. http://www.discoverfrance.net/France/Paris/Monuments-Paris/Eiffel.shtml
  2. http://www.bbc.co.uk/history/historic_figures/boleyn_anne.shtml
  3. http://www.metro.tokyo.jp/ENGLISH/PROFILE/overview03.htm
  4. http://www.wolframalpha.com/input/?i=how+many+square+miles+is+peru%3F
  5. http://en.wikipedia.org/wiki/Pope_John_Paul_II
  6. http://www.themistermen.co.uk/mrmen.html
  7. http://history.howstuffworks.com/revolutionary-war/liberty-bell.htm
  8. http://solarsystem.nasa.gov/planets/profile.cfm?Display=Facts&Object=Moon
  9. http://en.wikipedia.org/wiki/Mona_Lisa
  10. http://www-03.ibm.com/press/us/en/pressrelease/1920.wss

2 thoughts on “A software proposal to improve estimation

  1. Hi Nick,

    This sounds like an interesting exercise, but I think it would only be relevant to a small subset of “real world” estimation scenarios. All the questions you pose are for normally distributed situations (they live in what Taleb in “The Black Swan” calls “Mediocristan”). In these cases we know for certain what the upper and lower boundaries are and can have some confidence with estimation techniques (and there are plenty of good ones out there – see McConnell’s “Cone of Uncertainty” for example).

    The problem is we typically operate in complex (“Extremistan”) systems where there are no such upper or lower boundaries or normal distribution we can rely on. Crucially, in extremistan, we are vulnerable to the occurrence of infrequent but highly statistically significant events (“Black Swans” – Taleb). When we try to apply estimation techniques which work in non-complex systems we run a high risk of hitting such Black Swans and when we do the results are typically not good (e.g. project taking 3x original estimate or cost).

    Taleb is not the only person who talks of the significance of randomness or chaos. See the Cynefin Model, Complexity Sciences (such as complex adaptive systems) and the Long Tail for more research and material in this area.

    As a side, it is also worth looking at cognitive bias’ such as “planning fallacy” and “optimism bias”.

    Rob.

    P.S I tried to put some links in but my comment got rejected as spam :)

  2. Hi Rob

    Well, there’s lots of ideas in there, and I have no particular knowledge of Black Swan Theory or similar, other than what I’ve just Googled and skim-read (but that’s how everyone operates in blogs, right ;-?). So with those caveats I’ll dive in tackling some of the issues you raise…

    (1) The proposal above is for an application which helps you estimate certainty. What you then do with that skill is a second matter. So I still think the app will enhance that skill in someone.

    (2) There’s an implication in what you say that what we do with our new-found skill is estimate the duration of software projects. Certainly that’s an option, but it’s not what I had in mind. I have previously mentioned revenue projections, conversion rates, market size, and typical customer spend. In reality, I think a lot of these can and should be broken down into other factors, which can then be estimated.

    So measuring a big things (like revenue projection for a potential product) is a bit too daunting, but estimating the factors that make this up (market size, customer price sensitivity, etc) may be more realistic and achievable, particularly if we also use things like market surveys.

    (3) The proposal is about estimating certainty, not about estimating bounds of anything. So you might use a specific techniques to measure specific things (e.g. the size of a coding task, or the size of a market) and upper and lower bounds of those things. But you’d use your skills learned with the above application to then accurately describe your certainty of that thing.

    So if you say “I’m 90% certain this will take 3 to 6 days”, the “90% certain” bit is from your experience with this application; the “3 to 6 days” bit is from your particular software estimation technique.

    (4) The idea of a confidence level of 90% does assume we will make a mistake 10% of the time. So getting something wrong is allowed. Whether we’re wrong because our sample (e.g. market research) data was too limited, or because of a Black Swan Event doesn’t matter.

    (5) Training with feedback should significantly reduce cognitive bias.

    (6) There’s a hint in what you say that we shouldn’t estimate software projects because we might get them dramatically wrong, or that if a project runs wildly over budget (your example: 3x over budget) then we can just shrug our shoulders and say “Well, that’s life, it’s a Black Swan Event”. That sounds like a dereliction of duty to me. Mind you, I don’t think you are saying that, which means I don’t entirely understand your point there.

    (7) I’m skeptical that software estimation is a complex system in any practical, meaningful way. By which I mean maybe it is a complex system, but continual tracking and adjustment of behaviour (including changing/dropping some of our requirements) means we can still achieve a good outcome.

    (8), (9) and (10) I think I’ve said quite enough now.

Comments are closed.