Below we examine most common mistakes in scorecard development, and explain how to use them for most precise borrower rating and segmentation.
Scorecards provide a set of weights assigned to characteristics that demonstrate customer’s credit worthiness. With scorecards, customers are evaluated and rated according to their potential probability.
You can automatically make profitable decisions once customers are evaluated and scored: approve the loan application, define optimal pricing, take advantage of the up-sell opportunitites etc.
I'd recommend reading
The Very Basics of Scorecards by Brendan Le Grange in his
Credit Risk Strategy Blog, if you're new to the subject: this is a clear and elegant article on the subject.
Mistakes in scorecard development can be made in each of the following stages – data sampling, statistical evaluation of borrowers’ characteristics, formation of training and validation datasets.
1. Data sampling. Inaccuracies during data sampling are caused by selecting data samples that are not responding to the requirement of representativeness and randomness.
Representativeness means the maximum proximity of the data indicators in the sample to the actual borrower's characteristics in the loan portfolio. This requirement is natural and understandable, because the scorecard is expected to reflect specifics of
the dataset used for its development.
Randomness implies that loan application data should be included in the working sample independently.
There are two ways to prevent sampling mistakes – either by directly controlling data sampling procedures, or by evaluating the borrower’s statistical characteristics.
2. Statistical evaluation of borrowers’ characteristics. When evaluating borrower’s statistical characteristics you should pay attention to those indicators that are unnaturally distributed. For instance:
- Indicators that characterize one prevailing borrower category;
- Characteristics pointing to an obvious gap between selected groups of borrowers and actual characteristics presented in the credit portfolio.
Once you have discovered unequally or unnaturally distributed characteristics, you should change or adjust the procedure of forming the working sample and set different rules for assigning values to each characteristic.
3. Selection of training and validation datasets. The main criteria for selecting a training data sample says: training data should provide enough examples of profitable and delinquent loans.
As a rule, a training dataset comprised of 3500-4000 records is enough to successfully train a scorecard; provided that this dataset offers a 3:1 proportion of “good” and “bad” loan cases. It is possible to train a scorecard with fewer records, but you should
keep the 3:1 proportion of the loan case profitability.
This way, data mining techniques can be applied to improve borrower rating and segmentation in automated loan application processing. In upcoming articles, we will show more ways to apply data mining for decision automation.