Data is increasingly valued as an asset for companies, so ensuring that data is of high quality is imperative. Progressively, in the world of IoT, we are seeing machines make decisions. These decisions either provide insights into what's most beneficial
for customers or reduce their service-relationship anxiety. In order for machine learning models to generate actionable insights, diverse data of high quality must be available in real-time.
In the 1960s, data was supposedly managed in silos, often physically while there were also limited skillsets to churn insights. However, people who were investing in curating superior quality information reaped better revenues.
Then there is the emergence of business intelligence, which can be termed as a vintage capability today. Yet, it is an effective way of consuming data for reports and analytical models. The quality of data is assessed before it is loaded into a warehouse
in terms of contextual dimensions of quality, such as validity. Such models can be termed as generation-1 data quality management models.
However, to formalize the management of quality, a function can be set up. Standardization of data quality can be emphasized through data governance. Data Governance will ensure that certain actors, follow repeatable processes to complete data quality assessment,
root cause analysis, and issue management to recover and resolve data issues. As a result of the policies and guidelines, which define roles and responsibilities, and processes for ensuring accountability and ownership of data, active management of data quality
is possible. Certain important dimensions to assess quality of data -
Completeness — Does the data meet your expectations of what's complete? Column, Row, or Group completeness; Fill rate
Consistency — ensuring structural, semantic consistency and enforcing business-policy
Timeliness — Is data having a system or manual lag?
Validity — Is data streamed in a designated format and is it usable as per standards
Uniqueness — Does similar information exist as an instance within the data structure or ecosystem?