The Un-geeking of Data
This isn’t going to be easy! I’m going to attempt to explain two important factors in Data Analysis – the decomposition and normalization of Data – in such a way that people who are freaked out by this stuff can grasp and appreciate.
Whereas I don’t expect that too many of you will ever delve into Data to this depth, consider this to be that little bit of knowledge that might make you dangerous, but will give you the appreciation of what goes on at the ‘science end’ of Big Data projects.
Not only might this allow you to speak to people who appear to (and actually do) speak a different language to you, but I will also show how these skills can be used in other parts of business.
What I’m going to ask you at the top of the article instead of the bottom, is to tell me how far you got into this before your eyes glazed over and what you might like me to take down another notch or expand upon. If you take a look at the links at the bottom of this article, you’ll see that this has until now been a largely academic subject, but I don’t think it belongs in that arena. So here is my attempt at de-nerding/un-geeking two elements of Data science.
An Example of Database Design
A simple example of how Data might be organised, would be a patient record at a Hospital.
Among other things, patients have:
- names and addresses
- insurance information
- one or more doctors
- one or more medications.
The information isn’t duplicated across each patient. Instead, for example, each medication is a single entry in the meds table and may be referenced for many patients.
Hopefully you get the highest level gist. Even more hopefully, you didn’t just doze off!
A biological process through which organic material is reduced to e.g. compost
The act of taking something apart, e.g. for analysis
The splitting (of e.g. a matrix, an atom, or a compound) into constituent parts
You’re certainly familiar with one highly unpleasant type of decomposition. Well this type involves a body of Data and is a more controlled and less smelly process.
To paraphrase geek-speak for the regular Joe or Joanne, decomposition is taking something that is too large and complex to comprehend, and breaking it down into manageable pieces.
This is essential! Data is the lifeblood of marketing, but the amount that is generated by even a small business, is huge. Trying to wrap your head around the entirety of it is futile, so don’t even try.
In a manner of speaking, the Data most small businesses use is already largely decomposed, as each source(Google, Clicky, ….) presents its own information. This is where the concept of Normalization comes into play, as we first seek to bundle all Data sources together, before using it all to create the manageable, fathomable, comprehensible smaller chunks.
Decomposing Your Business Needs
You can decompose just about anything. Let’s not presume that the data that you have is organised in the most useful way for you. Let’s also not presume that most people’s businesses are organised in an optimal or scalable way.
To de-compose your business needs, you would use the basic concepts of Functional Decomposition. You decompose, or to say it in English, take apart, every aspect of your business, before recomposing it.
What do you do each day and at each step in the customer relationship process? I’d wager there are small tasks that you do as a matter of course, that wouldn’t even make it onto your list, as they have become second nature.
Basically, you are taking your business down to its smallest components in a manner you are unlikely to ever do without engaging in such an exercise.
In both a Data and functional sense, you will surely discover aspects of redundancy and inefficiency in your processes.
What is Normalization?
Sounds so grand, doesn’t it? But this is basically the removal of duplicate Data elements, followed by organising the remaining elements into meaningful, scalable buckets. If Decomposition is about breaking Data down into manageable ‘packets’, then Normalization is directly related to reconstituting the Data in the most efficient manner.
Oh, the countless hours of fun we could have discussing the 3rd Normal Form….. or not, as the case may be.
Normalizing a Database entails ensuring that queries to it will result in clean results. Perhaps you’ve been spoiled and seen few or no examples of database queries coming back with bad results, but that is only due to the work done behind the scenes before you ever have access to a database.
Currently, normalization is done by people, but one day, it will be done by machines. And when that day comes about, your simple google search will return only what is directly relevant to your query, as opposed to 2,150,000 results for ‘screen shot on a macbook’, most of which are entirely irrelevant.
Normalizing Your Business Needs
So if you can decompose just about anything, can the same be said for normalizing?
Well certainly the gist of it can be copied. After all, who doesn’t have any duplication of effort in their business endeavours?
Normalizing and decomposing may have a strict delineation with regards to Data, but they are surely two halves of a whole, when it comes to business.
- List your tasks
- Remove duplicates
- Place them in buckets
It’s All About Scalability
The benefits of this exercise go beyond the obvious. Surely, most if not all business owners want their business to be scalable. If, ultimately, you want to delegate aspects of your business, you now have your processes documented and ready for such a handover.
The History of un-geeking in IT
In the early days of Mainframe programming, everything was in code. And I’m not simply referring to programming languages. IBM would label something as simple as a Name and Address field in an 8-character field name that might have been A23G18ZB. Unintelligible to look at, but with the level of difficulty put there simply to make the not very difficult thing they were coding, look like only someone with an Information Technology degree could understand it. They wanted this to appear very difficult, and that is a mindset that stuck with various types of programming through the decades. It is a mindset that is all too evident beyond IT and into the world of Social Media.
After every simplification cycle, new technology and new thinking re-establish levels of complication. We are in awe of the intellect that creates the new leaps forward, but aware that each leap is simply Step 1 in each cycle. Call it the re-geeking, if you will.
The second phase in each cycle is dissemination, as the leaders and messengers take the new information to those in the forefront.
At Curatti, we will attempt to establish that third step in each process – the un-geeking – the point at which knowledge is no longer the domain of the few who are ‘in the know’, but is presented so that all may ‘get it’.
Additional reading: Decomposition https://software.intel.com/en-us/articles/data-decomposition-sharing-the-love-and-the-data
Image attribution: Copyright: ‘http://www.123rf.com/profile_bowie15‘>bowie15
The Decomposition Poster was originally designed in 1989 by Marc Rettig as a subscription premium for Database Programming and Design magazine. He now makes it available as a free downloadable PDF file.
Latest posts by Andy Capaloff (see all)
- Curatti Best Articles of 2020 (And Happy New Year 2021!) - December 31, 2020
- Data Redundancy: Why You Need An External Storage Device - December 10, 2020
- Why Do Most Guest Blogger Outreach Emails Suck? - November 17, 2020