Financial services companies offering credits need to assess the risk they are taking when accepting a credit.
This mainly consists of determining the probability that the borrower will not repay the credit, and the
amount of money that will be lost in that case. Usually this risk is expressed by respectively the Probability of Default (PD) and the Recovery Rate or Loss Given Default (LGD).
For both the PD and LGD parameters (or often PD × LGD is also used) financial service companies need to set
thresholds, i.e. up to which percentages are they willing to accept the credit. This depends on the aggressiveness and business strategy of the financial services company. A higher threshold in PD and/or LGD, means the institution also has to foresee
more financial buffers because of the higher chance of losing money. Of course most institutions will not just have one threshold, but multiple thresholds (different thresholds per product and customer segment), in order to have a more fine-grained business
On the other side, there is the "art" of determining the PD and LGD in the best possible way. Both are predictions of the future and no man or machine can predict the future without errors. Banks and other credit institutions therefore have complex
models (using rule engines and AI models with a maximum of input data, like personal/company data, financial data, collateral data, etc.) to best assess these percentages based on the insights obtained from historical data.
The better these models, the more money the institution can make, as there are less false positives and false negatives, i.e.:
- When the PD/LGD is underestimated, the institution is at risk and will lose money because of too many defaulting credits
- When the PD/LGD is overestimated, the institution will lose money because of too many missed opportunities (opportunity cost)
The PD/LGD will furthermore be helpful to price the credit or determine the right interest rate, i.e. so-called risk-based pricing. Such an adjustment of the credit price based on credit risk, allows further optimizing the ratio of financial risk
the institution is taking versus financial benefits.
Thanks to the rise of new technologies and the Fintech movement, there have been a lot of evolutions on these credit risk scoring models
in recent years. Especially the rise of AI and the usage of alternative data sources allow to create exciting new business opportunities, to offer loans to people (so-called unbanked and underbanked) and businesses who were refused by the traditional credit
However a big difference remains between the credit scoring modules for consumer loans and business loans.
Most consumer loans have already been highly automated, allowing to grant and decide upon many credits almost fully STP
(straight through processed).
Business loans on the other hand have much more inherent complexity, as businesses can be very diverse and complex (with multiple subsidiaries, complex shareholder structures, etc.). As a result, the analysis and decision processes for these loans
remain highly specific and manual.
For consumer loans, financial institutions usually ask the client to provide following data:
- Personal data, like name, civil status, number of children, address, phone number, etc.
- Professional data, like type of employment, employer name and address, employer sector, contract duration, etc.
- Financial data to get insights into all revenues and expenses of customers and their assets and liabilities
- Information about the need for the credit, i.e. for what will the money be used
- Information about the collaterals of the credit, i.e. get all details of the provided collaterals
Financial institutions enrich this data with other public and private data they have about the customer, like credit history (i.e. any past credits which were defaulted, the number of credits the customer already has, and the reimbursement track record
for all past credits), account transaction history (cross-bank via PSD2), etc.
Afterwards a number of ratios like AVI (Available Income) and LTV (Loan to Value ratio) are calculated as well.
All this info is then fed into the risk scoring model, which tries to predict the PD and LGD.
These models are evolving rapidly to give better, more accurate results to a larger group of customers (i.e. not only for traditional customers, but also for smaller niche segments):
- The usage of AI to improve the credit risk models: just like risk analysts try to identify correlations between the data provided by customers and the full data sets of historical credits which defaulted, AI tries to model these correlations. The
power of AI is however that it can do this in a much more fine-grained way (also taking into account very small correlations in the model), more automated (allowing to continuously update the model based on new situations/trends/historical data)
and much faster. Thanks to the large training data sets that banks have accumulated over the years and that are becoming increasingly well structured and of good quality, these AI models can become very well trained.
However it is important to understand that AI is also not a miracle solution, as it is still based on the correlations found in historical data (used as training data sets), meaning rapidly changing trends cannot be predicted by an AI model neither. Furthermore,
AI has the big disadvantage that financial institutions lose a lot of the explainability and control over the model. This makes it difficult to explain to internal employees, customers and regulators why the model comes to a specific PD/LGD value. For example,
it can be very difficult to avoid that AI discriminates based on race or sex, as just avoiding inputting these attributes is often not enough to avoid the model from discriminating.
- The use of non-traditional data sets, i.e. where traditional models are based on the input data described above, a number of Fintechs (such as Uulala, Koyo, Lenddo, FriendlyScore, ZestFinance, CreditLadder) have come up with more innovative ways
to score loans, which are improving traditional scoring models that tend to work very poorly (due to lack of data) or very negatively for specific customer segments (like freelancers, gig-economy workers, immigrants, etc.).
These Fintechs allow to do scoring based on new data sets, like social media data, telephone record data, shopping data, bank transaction data (collected via PSD2), etc. Based on this data, the risk scoring models try to model the behavior of the borrower
and predict the credit risk associated to the person.
While this innovation is excellent news for the customer segments rejected by traditional models, they do raise some important questions, about data security, data privacy (i.e. lower income persons having to give up privacy for getting a loan), but also
about the accuracy of these models, due to the lack of large historical data sets.
- The use of APIs and tools for easier and faster valorization of underlying assets or collaterals. These tools allow to estimate asset value, asset quality (risk of asset dropping in value) and asset liquidity (how easily can asset be liquidated
upon default). Several providers, like Capilever, provide API services that allow estimating these parameters for different asset types. Furthermore these tools allow to streamline the inventorization and (re)valorization of customer assets, which can potentially
be used as collateral.
Also for business loans a lot of change is possible. We indicated above that the analysis and decision process for those loans is still very manual. However, we see that more Fintechs are providing innovative offerings to automate these processes
for specific niche credits. A good example is invoice financing (also called invoice factoring), where the unpaid invoices are pre-financed by a credit institution. This product is very well structured and scoring can be done quite easily by analyzing the
historical invoice payments of the company.
Unfortunately, a large part of the business loans is still very manual. The big challenge in the coming years will therefore be to increase their STP rate, by:
- Feeding the input data with higher data quality, in a more structured way and faster. For example, easy integration with ERP and accounting platforms allows to get faster access to more structured company data (compared to annual reports)
- Having more flexible models, which cope with more diverse situations and even with unstructured data (i.e. typically via OCR and Natural Language Processing, allowing to structure this data and identify patterns).
Apart from making the risk scoring process more STP, more accurate and more tailored to different customer segments, there are 2 other aspects where financial institutions can make a difference:
- Helping customers to improve their credit scoring themselves. Instead of just providing a scoring (often just the final score result), banks should provide tools and advice on how customers can improve their own scoring. This can be done by
improving solvability, liquidity and trustworthiness, but also by providing additional insights into their financial data and the option to provide additional collaterals.
- Continuous reassessment or recalculation of the risk scoring during the life-cycle of the credit. This means a reassessment of the PD when changes occur in the personal, professional or financial situation of the borrower(s), but also
a reassessment of the LGD, by reviewing the value, quality and liquidity of the collaterals.
Of course in order to be profitable, this should be fully automated. When well implemented, it can help banks to better monitor and manage their risks (e.g. by lowering thresholds for future loans, increasing/decreasing buffers, asking customers to offer
additional collaterals, notifying customers of the identified risk, etc.).
As the above article demonstrates there are a lot of interesting innovations in the field of credit risk scoring. With the current turbulent economic times, we see a lot of new innovative risk scoring models, which are leading to much higher default
rates than predicted, resulting in a number of Fintechs coming into difficult papers. Time (and historical data) will tell which models provide the best fit.