For fun, I’ve been working through the recent Coursera course on functional programming with Scala, run by Martin Odersky, Scala’s creator. It’s been a good ride, but towards the end … Continue reading A small lesson in Scala
On and off over the last few weeks I’ve been thinking about Elaine Wherry’s painful story of hiring developers. But the thing that triggers the whole tale is worth drawing … Continue reading Your architecture impacts your business strategy
Technical debt is not necessarily a bad thing. In fact, having it at all is a healthy sign. Some may think otherwise, as suggested by this tweet from Benjamin Mitchell, … Continue reading Technical debt is healthy
This is third in an accidental series on testing, and today I’m going to walk through a thought exercise in improving test times. This follows directly from last week’s post … Continue reading Avoiding functional tests
The other day I watched Daniel Worthington-Bodart’s presentation on 10 second build times, and was most inspired by the idea that software which runs through the build process quicker is … Continue reading Faster builds make better software
I don’t normally find myself in conversations about over-engineering, but the subject came up twice last week, and that gave me a chance to think about the issue. Senior execs … Continue reading Why does over-engineering happen?
I was recently involved in a great example of software complexity, technical debt, and refactoring, and I want to pass on the experience. As part of a project some new … Continue reading Appropriate complexity for better living
A few words on rewriting and migrating systems, based on experience. This is because the painful rejuvination of Delicious under Avos is very much at the front of my mind … Continue reading Migrating systems (such as Delicious)
A couple of things happened recently bringing home something that I’ve found increasingly important: technology decisions are social. Social decisions in software architecture The other day in conversation about team … Continue reading Technology decisions are social decisions
Previously I’ve talked about estimation, using an example of a project that we were very confident would make a certain amount of revenue. The slight flaw in this approach is that “very confident” varies depending on who’s making the estimate. So here is a proposal for an application to train the user in consistent estimation. I’d love to see this running somewhere and looking pretty. “Improve your estimation” would be a fun game to play (if you’re that way inclined). So let me set out…
- “I’m very confident this will bring in $0.5m to $2.5m of new business”
- “I’m very confident this change will see no less than a 2% drop-off rate in conversions”
- “I’m very confident the addressable market is between 5m and 9m users globally”
We want to turn those into:
- “I’m 90% certain this will bring in $0.5m to $2.5m of new business — and you can be sure that when I say 90%, I’m right 90% of the time”
- “I’m 95% confident this change will see no less than a 2% drop-off rate in conversions — and you can be sure that when I say I’m 95% confident of something I really am right 95% of the time”
- “I’m 90% sure the addressable market is between 5m and 9m users globally — and when I say I’m 90% sure of something I really am right 90% of the time”
The purpose of the application is to help people really be right 90% of the time when they say they’re 90% certain — or any percentage they want.
Estimating well is important for accuracy. This is particularly true when we consider that targets like revenue and reach are actually factors of many other measures (market size, consumer price sensitivity, economic conditions) which are themselves estimated. With estimation piled on estimation it becomes even more important to move from a nebulous “very confident” to a concrete “90% confident” and so be sure of our bounds of certainty.
Credit for switching me on to the problem, and for the no-tech solution that follows, is to Doug Hubbard and is part of the narrative of his book How to Measure Anything. What is his approach to getting good at estimating percentage certainty? Have a look at this question:
What is the height of the Eiffel Tower in feet?
I don’t suppose you know… exactly. I certainly don’t. But we can have a good go at estimating a range with 90% certainty. I am absolutely certain that it’s taller than me, so it’s definitely over 6 ft. And I’m virtually certain that a mile is a thousand-and-something feet, and it must surely be less than a mile high, but let’s say it’s less than two miles for safety.
So I’m 99.9% certain it’s between 6 ft and 2,000 ft tall. But that’s too conservative, because we’re aiming for 90% certainty, not 99.9% certainty. So I’ve got start using some other tricks. I’m 6ft 3in, and a house is about six mes tall… call it 40ft. How many houses tall is the Eiffel Tower? Between six and twenty, I reckon. So between 240 ft and 800 ft. That seems better — I can sign up to that range with 90% confidence.
- What is the height of the Eiffel Tower in feet?
- In what year was Anne Boleyn born?
- What is the population of Tokyo?
- How many square miles is Peru?
- How many saints did John Paul II canonise?
- How many books in the Mister Men series?
- What is the weight of the Liberty Bell, in kilograms?
- What is the circumference of the moon, in kilometres?
- How many square centimetres is the Mona Lisa?
- How many patents did IBM lodge in 1999?
The aim, of course, is to have 90% of your ranges right. This is what Doug Hubbard calls being “calibrated” — i.e. when we say we’re x% sure of something we are indeed correct x% of the time.
Unfortunately it’s not quite as easy as saying we must score 9/10 in the above questions — we’re expected to get 9 out of 10 on average, not necessarily 9 of these particular 10 right. So we refer to statistics, and the binomial distribution says if we want to be 95% sure that we are calibrated then we can expect to get 8, 9 or 10 out of 10. Similarly…
- 20 questions implies 17-19 should be right at the 90% level.
- 25 questions implies 21-24 should be right at the 90% level.
- 30 questions implies 25-29 should be right at the 90% level.
We can calibrate ourselves with true/false questions, too. This time we have add how certain we are of our choice:
The first Cannes Film Festival was held in 1961. True or false? And how sure are you of your answer: 50%? 60%? 70%? 80%? 90%? Or 100%
Saying we are 50% sure means we have no idea — equivalent to the flip of a coin. 100% means we’re absolutely certain. If we answer five questions with certainties of 60%, 60%, 50%, 90% and 80% then we expect on average we’ll get 0.6 + 0.6 + 0.5 + 0.9 + 0.8 = 3.4 out of 5. If we want to be 95% sure that we are indeed calibrated then (if my maths is right) we should expect to get 2, 3, or 4 out of 5. This is clearly too wide a margin to be informative so in reality we should again be answering 20 or 30 questions at a time.
By putting in a feedback loop we can improve. We answer 30 questions, find out if we’re over-cautious or under-cautious in our estimating, then consciously compensate while we try answering another 30 questions. And so on until we start hitting our targets.
There are also tricks to helping us judge our understanding of 90% confidence. Again, I’m taking my queue from Doug Hubbard. Take a look at a wheel with a one-tenth slice marked out and a spinnable needle in the centre. If we are 90% confident of something then we should be equally confident that we can risk £5 that a spin of the needle will end up in the larger zone (if it ends up in the smaller slice we lose our money). If we would rather spin the needle then we’re less than 90% confident of our assertion and should widen our range; if we’d prefer to stick with our assertion then we’re more than 90% confident and should narrow our range; it’s only when we think we’d be equally happy with one or the other that we’ve got to 90% confidence.
This seems to be worth trying with each question whenever in doubt.
I think being good at estimating like this would be a cool party trick. Especially if we go to the parties held by our local maths department. Now the only question is: where do we find an endless source of semi-obscure questions? This brings us to…
The proposal, then, is for an application which automates this estimation training:
- How many questions would you like to answer? 10, 20, 30…? And would you like questions about ranges or true/false questions?
- Here are the questions, enter your answers. (If you want help with an answer you can pop up a wheel with a suitably-proportioned slice marked out — do you prefer your answer or prefer to spin the wheel? Now you can rethink your answer in the light of that.)
- Feedback: You got this many right, which was/wasn’t what we expected for someone who is a good estimator. You’re over/under/correctly confident so you need to adjust accordingly/congratulate yourself. Now let’s go again…
I imagine this would look nice with now-de-rigueur CSS3, pastel colours and big shiny buttons with rounded corners. And a good source of questions is Wikipedia.
In particular the info box on the right hand side of a many articles is good source of stark facts — for example, the one for Chippewa County, Minnesota. There are some good range-type questions to be extracted from this: total area in square kilometres, population as of 2000, year of founding. There are also some good true/false questions (largest city, date founded) but only if we can find a reasonable-sounding false answers.
That last problem can be solved like this: First, at the bottom of the page find a list in which this subject appears. In this case Chippewa appears in a list of counties. Then find another item in the same list — say, Murray County — and go to that page and find its info box. Now we can compare the same aspects of Chippewa and Murray (largest city, date founded, timezone) with different values (largest city and date founded are different; but avoid timezone as it’s the same for both) and swap them randomly.
The list these two subjects were drawn from (Counties), the heading of that list (State of Minnesota) and the aspect selected from the info box (largest city, etc) provide a reasonable way to present the question, since English-language question generation looks a bit tricky. We end up with something like:
- Answer these questions with 90% confidence:
- State of Minnesota > Counties > Chippewa County, Minnesota > Area – Total > ……. to …… km2
- Answer these questions as true or false, and state your confidence:
- State of Minnesota > Counties > Murray County, Minnesota > Founded > February 20, 1862. True or false? 50%? 60%? 70%? 80%? 90%? 100%?
There are still some flaws with this. You have to go through quite a lot of random Wikipedia pages before you find one with a decent info box. And sometimes whether the info in the info box is true or false is too obvious:
- State of Minnesota > Counties > Chippewa County, Minnesota > Named for > Chippewa Indians, Chippewa River. True or false? 50%? 60%? 70%? 80%? 90%? 100%?
- State of Minnesota > Counties > Murray County, Minnesota > Website > http://www.co.chippewa.mn.us. True or false? 50%? 60%? 70%? 80%? 90%? 100%?
Maybe we should restrict the subjects to some known areas (books, US states, composers,…) and not look in the entire pool of Wikipedia articles. And maybe some kind of text matching might help avoid the embarrassingly obvious true/false questions.
Or maybe you can think of a better source of questions.
If so, there’s a very useful service just waiting to be created…
Where to find answers to the above questions…