Blog article
See all stories »

An article relating to this blog post on Finextra:

Banks must resolve explainability and “black box” risk governance challenges to succeed with AI post

Data bias, “black box” risk, and lack of human oversight are the main governance issues for banks using AI, according to the Economist Intelligence Unit (EIU) report “Overseeing AI: Governing artifici...


See article

But it worked in Harrogate!

I have nothing against Harrogate, well not consciously at least. However, over recent months there has been much discussion about unconscious bias in many walks of life - including mention in this article about data used for algorithms. So let's consider the role that preparation of data for regulatory uses might have, and where issues exist, what I have been able to do to mitigate this impact in major data programmes I have run. For example, what are the inbuilt assumptions of the input data models and what impact does it have if these precepts and conditions are stretched, strained, or even broken.

To look at just one example of many, take the common approach of removing words such as Limited from company name data before performing a fuzzy match. The premise is that these words add little to the uniqueness of names, and if left in may make a character by character match seem far better than it is. An extreme example would be matching 'A Limited' and 'B Limited'. If the word Limited is left in, the algorithm will likely match the two names as most of the characters in name A match their counterparts in name B. In contrast the human observer will immediately note that they are probably totally different. So, removing the word in this case is a sensible approach and the Harrogate Limiteds will work okay. However, to get the full benefit of these techniques in other countries depends on having a relevant list of words to remove which will differ in different geographies. To apply the technique equally in a global programme needs an explicit effort and analysis - for example, I never thought I would have to learn the Vietnamese word for conglomerate! A similar approach is required for selecting relevant sets of abbreviations to expand or remove.

So issues of bias do exist although much can be done to resolve them, so long as you have the experience to recognise they exist and that their mitigation is planned early into the process. Otherwise, an attempt to simply roll out globally a previously successful fin crime model developed in Europe will not only suffer from bias, but also probably fail to identify the intended targets due to the large volume of data "noise".

 

Harrogate
3088

Comments: (0)

John Cant

John Cant

Managing Director

MPI Europe Ltd

Member since

06 Jul 2004

Location

London

Blog posts

43

Comments

21

More from John

This post is from a series of posts in the group:

Data Management and Governance

Anything that can be used to better manage and govern data.


See all