Big Data + Bad Analysis = Big Bust
Big Data, Big Hype?
“Big Data: Are We Making A Big Mistake?” (Financial Times, March 28, 2014)
“Eight (No Nine!) Problems With Big Data” (The New York Times, April 6, 2014)
“Growing Doubts About Big Data” (ABCnews.com, April 8, 2014)
The NYT article contains the following line we can all relate to: “big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions.”
The Financial Times article is a masterpiece, linking Google Flu Trends, the 1936 Presidential Election, the Target Department Store chain, and an epidemiology research paper entitled “Why Most Published Research Findings Are False”, before the brilliant ending paragraph:
“Big data has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.“
A High Profile Big Data ‘Failure’
These criticisms stem largely from the failure of ‘Google Flu Trends’. This Artificial Intelligence (AI) program correlated Data by “monitoring millions of users’ health tracking behaviors online”, by analyzing Google search queries “to reveal if there is the presence of flu-like illness in a population”. (Source: http://en.wikipedia.org/wiki/Google_Flu_Trends).
The program was highly successful for several years but failed miserably between 2011-3, overstating the spread of Swine Flu, which was hyped to be a major epidemic in the first of those seasons.
Data Is Simply Data!
Dodson states, “Big Data is not to blame — the truth is, Big Data is just a lot of data. Bad analysis is bad analysis” and “bad Big Data analysis is still bad Big Data analysis.”, and certainly,I strongly agree with him.
Data is simply Data. It is what people make of it that brings either value or failures.
How many times has any computer geek heard people blame computers for operator error? This is basically the same thing with potentially far higher stakes!
An AI model is only as strong as its starting suppositions, and success of the model for any period of time is certainly not an indicator that it will remain successful forever. And yes, the answers from Big Data Analysis are only as good as the questions posed it.
In the case of swine flu, people’s online inquiries around flu and flu symptoms drastically increased with the increased media coverage.
It was pointed out that a possible flaw in the model was that the people doing the searches – you and I in an allegorical sense – are not trained in medicine. I would add to this, that the people tracking the figures and trying to make sense of them, were not psychologists, so didn’t imagine that our search habits would change due to the hysteria of media hype.
If I was looking for a case to support my own assertions that nobody has all of the questions, and Big Data needs Big Collaboration, this is a great place to start.
Latest posts by Andy Capaloff (see all)
- Don’t Do These Things In Your Outreach Emails - April 1, 2021
- Curatti Best Articles of 2020 (And Happy New Year 2021!) - December 31, 2020
- Data Redundancy: Why You Need An External Storage Device - December 10, 2020