We recently invited blog readers to test whether their decision-making was affected by cognitive bias – and more than 1,000 of you took us up on the offer.
Our survey showed people ten statements, then asked:
- whether they thought each statement was true or false, and
- how confident they were that their answer was correct.
This was not a normal test, where the goal was to answer all the questions correctly. Instead, we were testing whether BIT’s readers were well-calibrated.
Being well-calibrated means you are usually right when you predict something will happen or say something is true. Superforecasters – people who are very good at predicting geopolitical events – are extremely well-calibrated. A poorly calibrated person gets things wrong even when they are 100% sure they are right. In other words, their perceptions of their own knowledge don’t match reality.
Poor calibration can be a problem for government. Policymakers might enact unrealistic plans or take excessive risks because they are overconfident in their judgement, or fail to take action because they are underconfident about what will work.
We found that the average BIT reader is overconfident. Across all ten questions, the average person was 72% sure they were correct, but only answered 60% of the questions correctly. Table 1 shows the full results.
That 12 percentage point difference represents moderate overconfidence. Perfect calibration would be if these statistics were the same (eg. getting 70% of the questions correct and being 70% confident on average).
Respondents were generally more likely to be correct when they were more confident in their answer. However, the correlation between accuracy and confidence was very low (r = 0.03). Figure 1 shows a striking example of this – even when respondents were 100% sure they were correct, they were still wrong 15% of the time.
Figure 1. Proportion of correct answers by confidence level.
Figure 2 shows the distribution of calibration scores. Around one in five respondents were well-calibrated, and respondents were almost five times more likely to be overconfident vs underconfident.
Figure 2. Distribution of calibration scores.
It might be some comfort to know that BIT is also not perfectly calibrated. In a similar internal survey last year with 65 employees, we found we were slightly underconfident as an organisation (answering 68.5% of questions correctly with average confidence of 63.7%).
Fortunately for all of us, there is quite good evidence that calibration can be improved with practice. As part of our Behavioural Government project, we are interested in testing whether this type of training can help people in government evaluate evidence and develop good policies.
Table 1. Results from a survey of 1,154 BIT readers.
|#||Text||True/False||% correct||% confidence||Calibration score (% confidence – % correct)|
|1||More than 1 in every 3 British people are aged 65+. (source)||F||60%||73%||13pp (overconfident)|
|2||A muggle is someone who is born into a non-magical family and who lacks magical powers. (source)||T||80%||89%||9pp (overconfident)|
|3||Bill Gates is the richest person in the world. (source)||F||77%||81%||4pp (well-calibrated)|
|4||Heart disease is the biggest cause of death worldwide. (source)||T||35%||73%||38pp (overconfident)|
|5||The American teenage pregnancy rate is higher than 3%. (source)||F||41%||65%||16pp (overconfident)|
|6||A hockey puck fits in a golf hole. (source)||T||41%||65%||16pp (overconfident)|
|7||“Rogue One: A Star Wars Story” was the highest grossing film globally in 2016. (source)||F||41%||63%||14pp (overconfident)|
|8||More than 700 million people in the world are obese. (source)||T||71%||68%||-4pp (well-calibrated)|
|9||The number of nuclear warheads in the world is currently at an all time high. (source)||F||71%||71%||0pp (well-calibrated)|
|10||Richard Thaler was the winner of the 2017 Nobel Memorial Prize in Economic Sciences. (source)||T||83%||76%||-7pp (underconfident)|
For each question, participants were asked to rate how confident they were that their answer was correct on a scale of 50% (no confidence) to 100% (total confidence).
In column 5, calibration scores were categorised as follows: Underconfident = score of less than -5, Well-calibrated = score of between -5 and +5, Overconfident = score of more than +5