“Data is a precious thing and will last longer than the systems themselves.” So said
Tim Berners-Lee, the inventor of the World Wide Web. ‘Precious’ provided the data is indeed trustworthy and of assured and consistent quality. And customers indisputably have conceded to the fact that Data Quality indeed forms the foundation of all their
data management and Analytics driven initiatives
But then why all the furore around Data quality and the trail of undertaking around it. . What baffles customers more often than not, is the enormity of the checkpoints at each and every stage of the data lifecycle. With an array of data management solutions
that customer have within their system landscape viz. Data Warehouses, Data Marts, Master data management solutions, Data lakes and the like, there appears to be some level of uncertainty and scepticism on the approach for Data Quality.
And If one were to look at the expanse of the data lifecycle, quality issues can potentially crop at each and every juncture, right from the source to the ETL or any middle ware transformations to the consolidated data warehouses & data lakes of the world
and till it finally catches the end user or the customer in some form of reporting analytics, user screen etc. and its kaboom!!!!
So amongst the variety of data and systems that exists within enterprises, is there any hard and fast rule on what where and how to tackle the Data Quality demon. Well very, much on most our wish list. but then, if wishes were horses……The sole purpose of
a data quality program should be to ensure that sacrosanct data is made available for all applicable business processes be they internal or external consumers.
Here are a list of key guidelines that can help steer your organization’s Data Quality vision:
Categorize and Prioritize your Data:
Amongst the various types of data available viz. Master data, Transactional/Operational data, Reference data, analytical data, there could be a pressing urge to cleanse the data within the confines of the operational or analytical systems since that’s the
closest where the users access/use their data, but calling that a short ranging solution would be an understatement, because after all one is just dealing with the problem as and when it comes and not really addressing it at its core. Rather what makes better
sense is to look at the category of data that is indeed being used enterprise wide and that would be none other than your Master Business entities of Customer, Product, Vendor, Employee, Assets, and Location etc. Thus Cleansing, Enrichment Match and Survivorship
processes applied to the Master data can be used to create the best version of the master record and thus provide a single, unified and consistent view of your key business entities.
Apply the checks early on in the lifecycle:
Cleanse the data as close to the source as possible and now that’s a fundamental best practice and of course a case of garbage in and garbage out .It’s always a better strategy to address the data quality issues as close to the source or for that matter
at the source itself, since that can save you a lot of effort and expense. And as much as you can attempt to cleanse and standardize the data in your source systems, you would rather want to put in checks prior to entry so as to evade the need for cleansing
Different Problems Different Latencies:
Certain critical processes with one’s organization may require real-time data quality checks which are inevitable so as to avert any fraudulent or duplicitous activities. Example being any banking transaction. As opposed to a lesser Business impacting process.
In both cases, as much as you one may applying the principles of data quality management, one needs to recognize the burning needs vs. the others and approach the task accordingly
Business inclusion at every stage:
The participation of the business stakeholders during the data Quality journey cannot be more emphasized. Right from the onset of the DQ journey a.k.a. Quality assessment to cleansing and de-duplicating the data, there is very high level of involvement expected
from the Business side. And needless to say, the Business commitment and sponsorship for the Data Quality program spells the probability of its success
Establish a closed loop Remediation process:
This continuous ongoing activity of assessment, cleansing, organizing will ensure that the data is fit for purpose and use at all times rather than conducting a one off activity or in retaliation to an error reporting or escalation
Adopt Agile Sprints:
One can call the combination of Agile and DQ a match made in heaven. Adopting an agile approach in your data Quality program can help largely reduce the latency that crops up from delayed feedback from stakeholders. An agile approach in DQ helps accelerate
the entire process since the Business Stakeholders can play the role of the product manager and additionally since the sprint would be focussed on a particular business area, it enables faster analysis and thus quicker results (read value in Agile)
Capturing vast amounts of data from disparate systems and attempting to analyse the data so as to unlock its true value can prove the be quite an uphill task for analysts, since the process is not only manually cumbersome, but also time inefficient and
error prone. With a plethora of toolsets available for data profiling and cleansing, data wrangling, it is but imperative that businesses invest in the right kind of tool, enabling businesses to truly deliver valuable insights in the most optimal manner
A continuous focus on data quality is worth every penny of the investment, since not only will it help instil the business’ confidence in data but will also help reap the benefits of all other enterprise solutions that are in place