Join the Community

23,178

Expert opinions

43,805

Total members

380

New members (last 30 days)

181

New opinions (last 30 days)

29,037

Total comments

Join Sign in

But it worked in Harrogate!

1 24 July 2020 Be the first to comment

John Cant

Managing Director

MPI Europe Ltd

I have nothing against Harrogate, well not consciously at least. However, over recent months there has been much discussion about unconscious bias in many walks of life - including mention in this article about data used for algorithms. So let's consider the role that preparation of data for regulatory uses might have, and where issues exist, what I have been able to do to mitigate this impact in major data programmes I have run. For example, what are the inbuilt assumptions of the input data models and what impact does it have if these precepts and conditions are stretched, strained, or even broken.

To look at just one example of many, take the common approach of removing words such as Limited from company name data before performing a fuzzy match. The premise is that these words add little to the uniqueness of names, and if left in may make a character by character match seem far better than it is. An extreme example would be matching 'A Limited' and 'B Limited'. If the word Limited is left in, the algorithm will likely match the two names as most of the characters in name A match their counterparts in name B. In contrast the human observer will immediately note that they are probably totally different. So, removing the word in this case is a sensible approach and the Harrogate Limiteds will work okay. However, to get the full benefit of these techniques in other countries depends on having a relevant list of words to remove which will differ in different geographies. To apply the technique equally in a global programme needs an explicit effort and analysis - for example, I never thought I would have to learn the Vietnamese word for conglomerate! A similar approach is required for selecting relevant sets of abbreviations to expand or remove.

So issues of bias do exist although much can be done to resolve them, so long as you have the experience to recognise they exist and that their mitigation is planned early into the process. Otherwise, an attempt to simply roll out globally a previously successful fin crime model developed in Europe will not only suffer from bias, but also probably fail to identify the intended targets due to the large volume of data "noise".

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

4665

Report

Channels

/regulation & compliance /financial crime

Data Management and Governance

Anything that can be used to better manage and govern data.

Join group

57 opinions 7 members 25 April 2025

Comments: (0)

John Cant

Managing Director

MPI Europe Ltd

Member since

06 Jul 2004

Location

London

More expert opinions

Carlo R.W. De Meijer Owner and Economist at MIFSA