Numbers don’t lie, but that doesn’t mean that they tell the whole truth. The world around us is governed by data and our analysis of it. Nearly every decision we make is based on data. Sometimes that data is from a pure source – first hand observations, or a highly trusted source. Other times, that data is provided by less-than-reliable sources, made worse when it is repeated without verification.
Ultimately there are two different types of data fraud you need to be careful of.
1) Incorrect data
It is often that a news outlet or source will simply fail to fact check. The same can happen with an internal system, or work-related inquiry. The first time it is said (especially if the source is untrustworthy) it is often ignored. But all it takes is one listener to repeat the incorrect statement, and soon it gets repeated again and again until someone reputable says it. If something sounds unbelievable (or too easy) question it’s sources and do your homework.
2) Correct data, Skewed analysis
This is the worst, as the data is correct, but it’s analysis is either misleading or flat-out incorrect. For example, if you had a high-school class of 20 students, 10 female, 10 male, and a teacher said “50% of the class failed.” To later read in the school newspaper that 5 females failed the test would be a very bad assumption based off real data (10 females in the class, 50% of the total class failed…). What is most harmful about this is that bad analysis from good data turns into fuel for other people’s “good data source” which can lead to even further bad data analysis.
What can we do?
1) Question sources
A reputable news source that is reporting statistics should cite their sources. If sources aren’t cited, simply assume they are made up – better not knowing a good stat, that knowing a bad one. If there are sources, feel free to check them out. Additionally, if only analysis is provided, especially derivative analysis, go and find the source data. Analysis is guesswork. It can be good guesswork, but knowing the underlying data is what is important.
2) Point people to sources instead of repeating data
When talking about stats, cite your sources. Or, if you’re recalling something from memory, say: “There was a study done that talked about this data. I can’t recall the exact numbers, but here is where you can find it.”
3) Analysis is Opinion – let people know
When presenting analysis, make sure you let people know you’re opinion. “A new study on children’s reading habits showed that much more girls were reading than young boys. My opinion is that ….” The data showed two numbers – but didn’t explain why, why is not a fact, it’s an opinion (until tested and proven as a fact).
4) Periodically question yourself
Remember that fact you learned when you were 5, that when you repeated made everyone remark how smart you were? I bet you’ve repeated it again since then. Unless it was a fixed-point fact (George Washington’s white horse was white) chances are it is out-of date. For example, years ago I learned that 51% of the population is women. At the time (maybe the 1990s) that was correct. Is it still correct? Turns out, no, it’s not. According to Wikipedia (http://en.wikipedia.org/wiki/World_population) the population is near 50/50 (101 males for every 1 female). Now, such a small change will probably not cause me to make a catastrophic decision failure, but take for example cellphone ownership. Even if the statistic I know is from 2011, it’s old, and very inaccurate.
Whenever you have to make a decision that relies on data, take the time to verify that data.