An article relating to this blog post on Finextra:
Banks must resolve explainability and “black box” risk governance challenges to succeed with AI post
Data bias, “black box” risk, and lack of human oversight are the main governance issues for banks using AI, according to the Economist Intelligence Unit (EIU) report “Overseeing AI: Governing artifici...
I have nothing against Harrogate, well not consciously at least. However, over recent months there has been much discussion about unconscious bias in many walks of life - including mention in this article about data used for algorithms. So let's consider
the role that preparation of data for regulatory uses might have, and where issues exist, what I have been able to do to mitigate this impact in major data programmes I have run. For example, what are the inbuilt assumptions of the input data models and what
impact does it have if these precepts and conditions are stretched, strained, or even broken.
To look at just one example of many, take the common approach of removing words such as Limited from company name data before performing a fuzzy match. The premise is that these words add little to the uniqueness of names, and if left in may make a character
by character match seem far better than it is. An extreme example would be matching 'A Limited' and 'B Limited'. If the word Limited is left in, the algorithm will likely match the two names as most of the characters in name A match their counterparts in name
B. In contrast the human observer will immediately note that they are probably totally different. So, removing the word in this case is a sensible approach and the Harrogate Limiteds will work okay. However, to get the full benefit of these techniques in other
countries depends on having a relevant list of words to remove which will differ in different geographies. To apply the technique equally in a global programme needs an explicit effort and analysis - for example, I never thought I would have to learn the Vietnamese
word for conglomerate! A similar approach is required for selecting relevant sets of abbreviations to expand or remove.
So issues of bias do exist although much can be done to resolve them, so long as you have the experience to recognise they exist and that their mitigation is planned early into the process. Otherwise, an attempt to simply roll out globally a previously successful
fin crime model developed in Europe will not only suffer from bias, but also probably fail to identify the intended targets due to the large volume of data "noise".