Posted by .

Big Data x 5NF = Really Big Data Errors

It is true that Big Data may offer huge opportunities for enterprises to gain hitherto unimagined insights. Alas, it also true that is has the potential to tell enterprises really big lies!

This has nothing to do with the quality of the Big Data. These ‘lies’ can arise from 100% correct data.  They arise due to data structure anomaly.

This huge risk is all down to the fact that the structures of data that you will get from external sources arehighly likely to break Fifth Normal Form (5NF).  In truth, it would be almost a fluke if such data did not!  I call this propensity for merged Big Data sets to lie 5NF Syndrome!

What is Fifth Normal Form?

A good definition for 5NF (that is understandable) is hard to find. The best way to explain it is by an example from real life.

A distribution company in the UK had a data table in its corporate distribution system that looked like Fig1 below.  This showed which manufacturer’s products the enterprise was sanctioned to distribute and to which retailers.

Figure 1 of Big Data and Fifth Normal Form (5NF) Lies

Over time, three regional divisions extracted parts of the data from the corporate distribution system for use in their local standalone systems in order to do some regional analysis. The extracted data looked like this in the three regional systems.

Figure 2 of Big Data and Fifth Normal Form (5NF) Lies

A regional manager, who had responsibility for the three regions, had some creative ideas for expanding his local distribution infrastructure. In order to ensure that his business case was sound, he needed to do some analysis that would enable him to know the overall form and spread of distribution services across these regions.

What easier way to do it than to run a report using the data in the three standalone regional systems? After all, they contained all of the data elements that he required.

All that was required was a simple SQL query joining the three.

New Insights

Before long his analysis was giving him lots of new insights. It suggested that the enterprise was actually distributing many more manufacturers’ products to many more retailers than they had previously imagined

Just to make sure that this was not due to an error in the SQL, the code was checked and proved to be working correctly.

It seemed that the data in the three regional systems had actually been concealing valuable sales intelligence. Excited by these new insights, the regional manager built a compelling business case for expanding his distribution facilities and presented it to head office.

Due Dilligence

Doing due diligence, head office staff checked the ‘exciting new insights’ against their own distribution analysis taken from the corporate distribution system – the system that had been the original source of the data for the three regional databases.

The distribution analyses figures did not match.  The regional system was showing much more activity than the corporate system.  Something was wrong, but what?

Together IT and the business checked the source data against the data that had been originally extracted into the three regional systems and found that it matched.

The SQL query that produced the analysis was checked and double checked. It too was correct.

Many hundreds of man-hours were consumed in trying to solve this huge anomaly.

It’s All Academic

It was not until an external consultant was brought in, as a last ditch effort, that the mystery was solved.  He listened to the history, looked at the tables and asked, “Have you heard about Fifth Normal Form?” A few had heard of it but no one had any idea what it was, other than some esoteric data rule that might have some relevance in academia, but none in a commercial enterprise.  How wrong they were!

Global Problem

Violations of Fifth Normal Form (5NF) trip up innumerable data projects around the globe every year.  So what exactly is 5NF?  It is a normalization error that occurs when you ‘over normalize’ data tables and then try to recombine them.

In the distribution company this ‘over normalization’ occurred when the original three-column table (Fig 1) was split into three separate two-column tables (Fig 2).  Performing queries on the individual tables will not result in any errors as, in standalone mode, there is nothing wrong with their structure.

The violation of 5NF occurs when you try to combine the columns through a query that joins all three tables.  This is not just an academic error.  It is a fatal error that consistently produces false values as demonstrated in Fig 3 below.

Figure 3 of Big Data and Fifth Normal Form (5NF) Lies

The row highlighted by the red arrows did not exist in the original table in Fig 1. Yet every time that you run a query that joins these three separate tables this extra phantom row is created.

This is just one phantom row generated from just four rows in the original table. Imagine how many would be generated if the original table had contained 10,000 or 1,000,000 rows!

Once fragmented in this way, any attempt to recombine the tables will always result in phantom rows being generated – and there is absolutely no way of ascertaining which rows these are.

What’s 5NF got to do with Big Data?

The fact is that there are probably billions of Big Data sets out there which, when queried on a standalone basis, will give totally accurate results.  However, these same data sets, if joined together through a query, will always produce spurious extra rows.  This means that the new ‘insights’ that Big Data is throwing up for your enterprise may well be the biggest lies that you could tell yourself!

When these structures exist, Big Data will always lie to you and you will have no way of telling which elements of data generated by the query are the lies!

The only thing you can do to prevent these errors is to test to see if the data structures violate 5NF – before joining the data!  How to do this is far too much to cover here.  I show one technique for checking this in my book IMM Data Structure Modeling.

One Response to “What Are You doing to Protect Your Big Data from 5NF Syndrome?”

  1. Milan Kucera

    Hi Jon,
    again, and I think I am repeating, great topic. I will not respond in correct way to this article. I try to put on public only a few ideas:

    1. Value of information increases when it is integrated with other resources
    2. Value of information increases by increasing accuracy
    3. Value of information increases by its sharing and use

    Those are very good ideas lead to higher effectiveness of information architecture. Organization today face to the buzzword “BIG DATA”. It is important topic, but opposite to this stand “information overload”. It is possible to find a huge number of articles focuses at impacts of an information overload to effectiveness and efficiency of business processes. I can summarize these into the word “negative”.

    And immediately organization face to the following challenges:
    1. how to keep “big data” as much as accurate
    2. how we will ensure the business process earn only relevant information and how to keep processes out from information overload
    3. how much the “big data” are sharable and used?

    And final question: Is it cost effective to keep “big data”? I am not sure if anybody does research measuring positive impacts of having big data to increased revenue. Is anybody here who did this type of research?

    Regards, MilanK.


Leave a Reply

Your email address will not be published. Required fields are marked *

Sarah Outerbridge of Stallion Lawyers

Working with John has been one of the best business decisions I have made.

I have been working with John for almost 12 months now and my business has grown much quicker than it ever would have without his help.

John has kept me accountable, helped me stay on track with my goals and discover and clarify my vision; as well as providing valuable guidance.

In addition, John really cares and it is great working with someone who shares my values.

I would highly recommend John to any business owner looking to take their business to the next level or just wanting to reach their goals faster.

Sarah Outerbridge, CEO, Stallion Lawyers, Gold Coast.

Working with John Owens International (JOI) has been immensely beneficial to our business.

Using the JOI Board Advisory service has helped our Board and senior executive team to gain significant new insights into innovative strategies and structures that have presented new and profitable business opportunities both domestically and internationally.

Using JOI as mentors to our divisional managers has enabled these managers to significantly grow in their roles, resulting in higher performance for their division both organisationally and commercially.

I have no reservations in recommending JOI to any organisation that wants to have the thinking, systems and structures to take itself to the next level.

Dean ReidCEOMSi GroupGold Coast, Australia

Working with John as a coach and mentor has been transformational.

John has a gift for enabling people to view challenging situations, both business and personal, from completely new perspectives that remove blockages and provide a way forward to simple, elegant solutions.

John is committed to achieving the very best for his clients, but they have to be prepared to work for it. Do not work with John unless you are prepared to be stretched and challenged. However, if you are prepared, the results will be amazing.

I have no reservations in recommending John to anyone who wants to achieve real transformation both in their personal and professional lives.

Casey ReidGeneral ManagerThe Leather DoctorQLD, Australia

John’s reputation for bringing rapid change through the power of simplicity to businesses is well deserved.

I first met John when he did consultancy for a business of which I was a director and major shareholder. His techniques enabled John to do the work quickly and get right to the core of our challenges. His keen analysis enabled him to provide the board and executive with some great insights and outstanding recommendations.

His work, both as an enterprise consultant work and a personal mentor, is first class.

John SandfordDirector, Trustee and Advisor

Here exists a brilliant and powerful mind in business. John Owens skills as a business consultant scream one word for every business: transformation.

Since having John on my of directors for my brand – Emily Gowor, Word Artist – I have resolved challenges in the flow of daily and monthly business, devised a company structure that will be put into place in the new year, and brought a significant and clear distinction to the role that I ought and ought not to be playing in my own company.

Without the clarity that John bring to a business, you will not know where you stand within your own enterprise – and your enterprise may very become the owner of you instead of the other around.

The opportunity to have John consult for your business – whether your are an individual with a growing personal brand, a small enterprise or a large corporation (and no matter where you are globally) – is priceless. It will double your revenue at the very least, and bring forward with crystal the most optimal and long-term profitable structure for your business to follow.

Build your success the right way. Take advice form John Owens – he is a genius in business, and will care about your enterprise;s success as much, and in moments, even more fiercely than you do. It’s his commitment every single day to stand for people’s financial and vocational prosperity. Thank you John.

Emily Gower5th Time Published Author, Speaker, Book Mentor & Founder of Gowor International Publishing