Join the Community

23,828

Expert opinions

40,587

Total members

386

New members (last 30 days)

198

New opinions (last 30 days)

29,228

Total comments

Join Sign in

Data Quality: Beyond the Surface

1 13 August 2025 2 comments

Sanjeev Nargotra

Senior Consultant

Tata Consultancy Services

In today's era of quick e-commerce, ordering anything is just a click away. Imagine ordering a crate of your favorite beer on a bright Sunday afternoon, and getting delivered in just 20 minutes. Upon arrival, you find the crate is broken, but fortunately, the cans inside are intact. You probably will not complain much. What if besides broken crate, few cans had dents too. You will be agitated but once you have your favorite beer chilled to perfection, you will probably be more considerate about the broken crate and the dented cans.

Now, consider a different scenario where the crate arrives in perfect condition with intact cans, but the taste of the beer is off and unpleasant. Which scenario would you prefer? Most likely, you would be more forgiving in the first scenario, as the quality of the beer saved the day, despite the broken crate and a few dents. However, in the second scenario, no good packaging could compensate for the compromised quality of the beer.

Let's apply this analogy to data quality. Here, the crate represents data storage, typically data warehouses and Data Marts, the cans are the tables and columns, and the beer is the data itself. You can tolerate a less-than-optimal data warehouse and table structure if the data is accurate and business is not complaining. In the second scenario, you have the latest cloud data storage, and tables and columns are replaced with data products, but the data itself is questionable. How would you react?

Common sense dictates that we should focus on fixing the beer and the data, as they are the main heroes of the story. Unfortunately, we often concentrate on fixing the crates and cans or data warehouses and data structures. While addressing these aspects is important, it will not save us from embarrassment, fines, and penalties if a regulator comes knocking on our door or business complains about loss in revenue and operational inefficiencies.

This explains why many analysts estimate the cost of fixing bad data quality for an organization to be meagre USD 14 to 20 million. This cost is for fixing the crates and cans not Beer and Data. This raises a critical question: why would any organization invest efforts to solve data quality issues when 14 million on a balance sheet do not even appear or appear as the last two decimal places? If we really consider the impact of bad beer and bad data, we will have dissatisfied customers leading to fall in customer and business satisfaction, frequent returns and manual reconciliations, dip in revenue and on top of it, FDA and Fed knocking on your doors.

Why is data quality misunderstood?

Although quality can be somewhat nebulous, it is fundamentally significant. Data quality is a complex and context-dependent concept often misunderstood across business, technology, process, and data science domains, with each attributing different issues to it. Numerous studies have squarely blamed low AI adoption on poor data quality. Before seeking a solution, it is crucial to understand what everyone means by quality.

The discussion raises critical questions about whether data quality equates to data reliability, if it unfairly bears blame for broken processes or managerial conflicts, and whether tools or process fixes alone can resolve it. Ultimately, true data quality encompasses understanding the context, intent, and dimensions of data, suggesting it is the sum total of all data management domains rather than a single isolated aspect

Diverse Perspectives on Data Quality

Each executive views data quality issues through their unique lenses:

Business concerns focus on incorrect or missing data that affect reports and operations, often attributing reconciliation needs to poor data quality.
Technology emphasizes technical errors such as null or missing values, syntax errors, and data type restrictions.
Process Owners highlight the importance of data enrichment and the lack of standardized definitions.
Data Scientists struggle with interpreting the meaning of data elements like tables, columns, or fields, which impacts their ability to use data effectively.

To make data usable, we need to understand the context, intent and dimensions of data

Context: This refers to the environment or situation in which data is used. Understanding the context is crucial for ensuring that data is relevant and applicable to the specific scenario
Intent: This involves the purpose or reason behind data collection and usage. Knowing the intent helps in aligning data quality efforts with the goals and objectives of the organization
Dimensions: These encompass the various aspects of data management that influence quality. Effective data quality management requires integrating all domains of data management rather than treating data quality as an isolated issue

Applying data governance principles:

Effective data quality management requires integrating all domains of data governance rather than treating data quality as an isolated issue.

Identify, Understand, and Catalog Your Core Data Assets: As explained in our analogy, it is important to identify the hero, data in this case, and clearly articulate the intent, context and dimensions of quality to be monitored.

Beyond Paper Ownership: Ownership does not just mean assigning a name to a data asset. It involves a deep emotional attachment to the assets one owns and ensuring that they are always in the best condition.

Standards, Policies, Procedures: What gets measured gets monitored. Following this logic, it is important to define standards, policies, and procedures for your data. However, this does not mean copying them from various sources. They should be contextual to your business.

Guardrails and Controls: It is crucial to establish guardrails and controls to protect your data and adhere to standards and policies in the context of business and regulations.

Processes: Processes must cover the end-to-end data lifecycle to ensure quality of data at all the stages and the workflows must support the processes.

Operating Model: Ultimately, it is people who will ensure the quality of data is protected through processes and technology. This is where an operating model comes into play. It is not a bureaucratic setup but a group of people who genuinely care about data.

In a nutshell, we must love our data exactly the way we love our beer. Fixing the quality of beer is of utmost importance, and for that it is important to find out where in the supply chain management the quality of Beer went wrong. Whether there was an issue in sourcing ingredients, brewing, packaging, distribution, or retail. Effective management of this chain is crucial for breweries to meet demand, minimize costs, and maintain product. Similarly, for fixing and maintaining quality of data, it’s important to map the e2e data lifecycle to understand the changes and transformations happening on the data through lineage, traceability, auditability, or provenance.

Poor data quality is far more debilitating than a few million dollars on Balance sheet. Hence Data Quality must be embraced as a way of life to enable organization to support business strategy and regulatory compliance.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

875

Report

Channels

/artificial intelligence /regulation & compliance

Data Management and Governance

Anything that can be used to better manage and govern data.

Join group

60 opinions 6 members 13 August 2025

Comments: (3)

Ketharaman Swaminathan Founder and CEO at GTM360 Marketing Solutions

14 August 2025

People judge a book by its cover. In many cultures, if they receive a broken crate, consumers will return it instead of bothering to check whether the cans and the beer are fine. Even in cultures where people will proceed to the beer, there's one fundamental flaw in your analogy: Beer is for human consumption. Whereas data / table / warehouse is mostly for machine consumption. If warehouse is dented, downstream systems will likely be unable to get to the table or data.

Report

Sanjeev Nargotra Senior Consultant at Tata Consultancy Services

Author 14 August 2025

Appreciate the counterview. But that precisely is the point i am making. Data must be for human consumption.

Report

Ketharaman Swaminathan Founder and CEO at GTM360 Marketing Solutions

14 August 2025

When I started out, it was called Electronic Data Processing. Then it became Information Technology. Now McKinsey uses the Data-Information-Knowledge-Wisdom Framework. The point is, it was always recognized that data in itself can be overwhelming for human consumption, ergo it must be processed into information, knowledge, insights and wisdom for human consumption. I did not find any explicit mention in your post about data being made fit for human consumption. OTOH, all the activities that it proposes to improve data quality - e.g. identify, catalog, guardrails - are generally done my machine in any business.

Maybe your definition of data is different but, according to all popular frameworks I'm aware of, data is NOT meant to be for human consumption.

Also, one important construct I found missing in your post is metadata. Data in itself can be pristine but if it's attached to the wrong metadata, the said data is useless for both machine and human consumption. You might treat metadata as beer but I'd treat metadata as the label on the can.

Report