Andy Capaloff
August 10, 2014

9 Things You Need To Know About Data Big and Small

There are many great resources that detail the benefits of Big Data. Danny Brown has certainly written many (here’s just one example), as has Daniel Newman (and here’s an example of his work).

I’m as excited as they and others are.  But with the Big Data journey barely really beginning, it seems opportune, even for an excited proponent of how it can help us, to express a few cautionary notes on Data, Big and Small.

Neither Computers nor Data Make Mistakes… But People Do!

Hands up who hasn’t blamed a computer for making a mistake? Ok, not too many hands went up, even metaphorically, right?

And how many of those ‘mistakes’ turned out to be operator error instead? How many of the rest were software error?

The correct answer is, every such accusation turned out to be one or the other.

Guess what? The same is and will be true for Data. If the results as presented prove to be wrong, then it is due to faulty analysis, not bad Data.

The Search For Vindication

This one heading has lots of little elements we’ve all experienced, including:

  • Political spin-doctors
  • Pharmaceutical research
  • People reading books or articles, or watching TV, and zeroing in on every aspect they ‘needed’ to find, while missing virtually everything else
  • Choosing a source to match an opinion

Not All Statistics Mean As Much As Some Attribute To Them!

Please forgive this Football-mad Brit a Soccer analogy:

You can barely watch a match without seeing the possession stat flashed up at regular intervals. For example, the home team had the ball 53% of the time, to their opponent’s 47%. But you know what? Much of the time, like the occasion where, with those very numbers, the away team deservedly won 5-0, they’re nothing more than numbers that paint a very poor picture of what is happening.

Not All Numbers Make Good Statistics

einstein-not-everything-that-can-be-counted-countsThe saying in this picture is, along with so many others, amusingly, but wrongly attributed to Albert Einstein. The actual statement is even more telling.

In 1963, William Bruce Cameron wrote:

It would be nice if all of the data which sociologists require could be enumerated because then we could run them through IBM machines and draw charts as the economists do. However, not everything that can be counted counts, and not everything that counts can be counted.

Not All Statistics Stand On Their Own

Standard Deviations! If you took Maths past a certain level at school, you know, and may even love them. As shown in this Harvard Business Review article by Douglas Merrill, ‘outliers’ – numbers that are far enough outside of the norm that they may skew Data samples – can make nonsense out of some numbers. These need to be considered in any presentation. Merrill gives the example of adding a tiny man and a very tall woman to a relatively small group, to show how one outlier on each side, could suddenly make a limited study on height show that women are taller than men.

I can give another example. Have you ever looked at the stock marker ticker on CNBC, Bloomberg or elsewhere? I’ve seen the workings of the New York Stock Exchange from the perspective of the Specialists, more commonly known as Market Makers. They see price Data tiny fractions of a second before anyone else, and have programs to smooth out the outliers.

Think of what might happen if the true volatility were seen? Now imagine what might happen if all businesses act on all Data, without first removing the anomalies!

The Same Numbers Can Mean Different Thingsratio_data

Let’s just say, for example, that in two exams, you (or your kid?) got 75% and in each, only the top 20% of those tested were given A grades Your grade depends on the scores of others as much as your own, and it isn’t inconceivable that the exact same number may net you one A and one B!

And Sometimes, Different Numbers May Mean The Same Thing

Let’s look at something more pertinent to your current concerns. Your blog flares, retweets, likes and shares.

Your numbers are subject to fluctuations based around national holidays, school holidays, bad weather, great weather, major conferences and industry announcements, among other things. So are those of your competitors!

Social Measurement tools (yes, I’m talking about Klout!) may penalise you for your August numbers being below your May numbers, but pretty much everyone else’s also went down, so your percentile – the only number that really counts – may not have changed at all

The Percentile

Certainly, numbers are useful for internal measurement. But many of those that truly count, are to do with where you stand in comparison to the marketplace. And because numbers fluctuate based upon so very many factors, looking at the fluctuation – or lack – of your numbers alone, can seriously mislead.

There are two aspects to percentiles that are worth noting, and perhaps it’s appropriate to use Compliance terminology to describe them:

The false negative: As alluded to above, is when your numbers and everyone else’s change, meaning your percentile hasn’t shifted at all.

The false positive: This is nasty and needs to be guarded against. It is when your numbers stay the same, but those of you main competitors improve. It’s quite possible in an improving market, and unless you are keeping an eye on the broader picture, you could well miss the early warning signs

The Crux!

Our ability to tune out words from conversation or glean from articles exactly what we sought to find, can and will by some, be translated into their analysis of Big Data.

Who will blame the Data when results mislead?

How many will translate selective hearing and selective reading into selective analysis?

How many people will commission reports seeking the short-term boost of vindication rather than the guidance of truth?

Ultimately, of course, whereas you surdata_and_you_bff-1ely can fool some of the people all of the time and all of the people some of the time, the only fool in the long run, is the person who believes they can forge long term success based upon the smoke and mirrors created by purposely faulty, or at least blinkered, Data analysis.

The Takeaway is Very Simple

You have at your disposal, the means to analyse, down to the minutest detail, the things you do that work and the things you do that need work. Truth can be a very hard friend, but it is folly to ignore it in any aspect of life. All the more so, when numbers can quantify the folly. Take heed of the numbers. Learn from them. Seek the truth that is within them. And reap the benefits.


Image attribution:

The following two tabs change content below.

Andy Capaloff

Andy Capaloff is the COO of Curatti. Prior to moving into the world of Content Marketing, Social Media Management and the day-to-day running of a Digital Marketing company, Andy spent over 3 decades in various aspects of IT. It is here that he honed his writing and technical skills, and his ability to ask uncommon questions.