Blog article

See all stories »

Channels

Group

External | what does this mean?

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

BigData Lake for Financial Services - Need to stress on Platform Governance

As Banks and Insurance firms have already embraced Data Lakes for their Artificial Intelligence and Machine learning capabilities, it is important to look for continuous Return on Investment on the platform.

If a Data Lake is not well maintained, it can turn into a swamp while finding usable data can confuse the data consumers. Most challenges can be solved by including an active platform governance of the Data Lake.

A data lake as a distributed file system hosts authoritative copies of source data having a variety of data that include assorted formats including structured, semi-structured formats like a JSON, XML and unstructured data like images, audio.

Accumulating technical debt with business use-cases will often lead to increased up-front costs during migration and maintenance costs of existing data.

Lack of data-trust often leads to consumers getting their own copies of data onto the data lake though they might exist already. However, due to lack of self-service discovery capabilities – other consumers might not be able to find the right dataset.

The focus areas of a data lake Technology operating model should be on the below aspects of Data Management –

Data Cataloging – A know-how on where the data is coming from is not available after ingesting and building pipelines. Also, what data exists in the lake and relevant business context of the data being applied there is required.
DataReuse – Before ingesting Data, it is always advisable to see if an existing coverage for data is available through discovery. If a data-asset exists, it should be re-used.
Data redundancy – Maintaining multiple copies of same data for different use-cases can be high on the data management cost including Data Quality and Metadata Management.
Investment can be made in a Business Information Model rather than maintaining redundant data on the cluster
Physical replication of a data asset on multiple Data Nodes, is a best practice configured for reliability & Fault tolerance. This aspect is different from maintaining different copies of the same asset by data providers or consumers.
Authoritative copy certification – When the data lake has been active for some time, and there are multiple copies of same logical asset, it is advisable to identify an authoritative asset and certify it for other to provision.
Data Archival & Deletion – Often coming towards an end of a data life-cycle, this is often ignored. Curating the active period for the use-case will help the Data management team in archiving such data that need not be maintained.
Data Quality – Moreover, data might not be of significant quality that can provide an outcome on Artificial Intelligence or Machine Learning Models. The focus must be on profiling the data, understanding characteristics and monitoring quality through rules. Cleansing should not just be on the copies but also on the authoritative sources.

5147

Channels

Sustainable DevOps

Group

Analytics in Banking

External | what does this mean?

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Report abuse

Comments: (0)

Join the discussion

Tejasvi Addagada

Data Governance Head

Fortune 500 financial service provider

Member since

02 Sep 2014

Location

Mumbai

Blog posts

Comments

Blog article

See all stories »

Channels

Group

BigData Lake for Financial Services - Need to stress on Platform Governance

Channels

Group

Report abuse

Comments: (0)

Tejasvi Addagada

More from Tejasvi

Blog post

Bigger than Technology

Are you confident in the quality of your data and its impact on your generative AI model insights?

Blog post

Data Management and Governance

Does data risk need to be managed actively through a data risk function?

Blog post

Data Management and Governance

Managing Data Quality is important to the success of digital driven Financial services

Blog post

Pace your BCBS compliance, within the limited time, to a success

Analytics in Banking

Community

See all blogs »

Artificial Intelligence

Towards AI Agents: addressing rule-based governance deficiencies

Climate Fintech

Interoperability: the misused word in Carbon Credits Markets Technology

Technology for Social Good

Role of Sustainable Tools in Banking

Innovation in Financial Services

On International Water Day, How can Fintech Stem the Flow of Climate Change?

Now hiring

Blog article

News in your inbox

Channels

Group

BigData Lake for Financial Services - Need to stress on Platform Governance

Channels

Group

News in your inbox

Comments: (0)

Tejasvi Addagada

More from Tejasvi

Blog post

Blog post

Blog post

Blog post

Analytics in Banking

Community

Now hiring