Join the Community

23,986

Expert opinions

40,655

Total members

365

New members (last 30 days)

205

New opinions (last 30 days)

29,266

Total comments

Join Sign in

Establishing “Expected Behavior”: Using Median, Standard Deviation and Avg to Detect Suspicious Txns

31 July 2025 Be the first to comment

Joseph Ibitola

Head of Demand Generation

Flagright

In AML and fraud monitoring, expected behavior refers to the normal patterns of transactions for a given customer, their typical transfer amounts, frequency of transactions, usual beneficiaries or destinations, etc. Defining this baseline is critical because it enables compliance teams to distinguish routine activity from anomalies. Unusual transactions that deviate significantly from a customer’s expected behavior can be red flags for money laundering or fraud. Conversely, what’s “unusual” for one customer might be completely ordinary for another, so a one-size-fits-all threshold will either miss risks or generate false alarms.

Regulators and industry standards increasingly demand a risk-based approach that incorporates customer-specific behavior norms. Global authorities like FATF and the EBA emphasize continuous monitoring of customer activity against an individualized risk baseline. The importance of tracking expected vs. actual behavior isn’t just theoretical, there have been real enforcement cases underscoring this need. For example, an FCA action against Santander UK revealed that a small business customer, who said to expect around £5,000 in monthly deposits, was in fact receiving millions per month – far beyond its stated profile. The bank’s systems generated some alerts, but poor tuning and follow-up allowed ~£298 million to flow through the account unchecked. In another case, NatWest was fined after a client’s deposits (totaling £265 million) wildly exceeded expected volumes without timely intervention. These incidents show that failing to compare current behavior against a proper baseline can let obvious anomalies slip through, or conversely, flag too many “noise” alerts if expectations aren’t factored in. In short, knowing a customer’s normal habits is key to catching the abnormal.

Key Methods to Establish Behavioral Baselines

To define what’s “normal” for a customer, analysts use a few fundamental statistical measures. Each has its strengths and weaknesses in capturing typical behavior:

Simple Average (Mean)

The average (mean) is the sum of all observed values divided by the count of values. It’s straightforward and often used as a quick baseline for transaction amounts or counts. For example, if a customer made 10 transfers totaling $5,000 last month, their average transfer amount is $500. A simple average is easy to compute and understand, which makes it a common reference point.

However, the mean can be misleading when there are outliers. One or two unusually large transactions will skew the average upward (or downward, if the outliers are very small). In other words, the mean is not a robust measure of central tendency, it’s sensitive to extreme values. If a user usually sends $200 - $500 per transfer but made one $10,000 transfer, the average might shoot up into the thousands, no longer reflecting the typical range. This is a major con of relying solely on the mean: it may not represent the true normal when the data distribution is skewed or contains anomalies.

On the pro side, the average does capture overall trends and is useful if the data doesn’t contain extreme outliers. In many cases, compliance teams might start with an average transaction size or average count per period as a baseline, then adjust for its weaknesses using the measures below.

Median

The median is the middle value of a data set when sorted from low to high (or the average of the two middle values if there’s an even number of points). By definition, half of the transactions are above the median and half below. This makes the median a robust indicator of the “typical” value, especially for skewed distributions. Unlike the mean, the median isn’t dragged up or down by a few extreme outliers. In a dataset of transfers that are mostly around $300 but with one or two huge transfers, the median will likely stay near $300, reflecting what’s normal for the majority of transactions.

For AML compliance monitoring, using the median transaction amount can be very useful.

Pros: It provides a realistic center of gravity for a customer’s behavior, filtering out sporadic spikes. If a customer’s median transfer is $250, that’s a good sense of their usual transaction size even if they occasionally have a $5,000 wire.

Cons: The median doesn’t convey anything about variability or range by itself, it won’t tell you if the customer sometimes does $1,000 or $5,000 transfers unless you also look at other metrics. Also, if the activity volume is low (say a customer has only a few transactions), the median might be less informative (in an extreme case, with only one transaction, the median equals the mean equals that value).

Standard Deviation

While the mean and median gauge the center of the behavior, the standard deviation (std dev) measures the spread of the data, how much variation there is around the average. In transaction monitoring, standard deviation is useful to understand how volatile or consistent a customer’s behavior is. If most of a user’s transactions are usually in a tight band (e.g. $200-$300), the standard deviation will be relatively small; if their transaction amounts swing wildly between $50 and $5,000, the std dev will be large.

The key use of standard deviation is to set an adaptive threshold for anomaly detection. Statistically, for many distributions (assuming a normal bell-curve approximation), about 95% of observations lie within ±2 standard deviations of the mean, and ~99.7% lie within ±3 standard deviations. That means anything beyond ~2 or 3 std dev from the average is quite rare and can be considered unusual. For example, if a customer’s average transfer is $500 with a standard deviation of $100, then a $800 transaction is 3 std dev above the mean, an extreme outlier in context, since fewer than 0.3% of points would naturally fall beyond 3σ in a normal distribution. In practice, such a transaction would be flagged for review because it’s far outside the expected range.

Pros: Standard deviation gives a concrete way to quantify how far off a current transaction is from the norm. Using rules like “flag if amount is more than 2σ above the average” adapts to each customer’s variability (what’s high for a usually steady customer might be normal for a volatile one).

Cons: Std dev assumes a somewhat symmetric distribution around the mean; it may not be as meaningful if the distribution is highly skewed (where median is more relevant) or if there are frequent outliers (which themselves inflate the std dev). Also, calculating a meaningful std dev requires a decent history of data points, it’s less reliable for a new customer with little transaction history.

In summary, mean, median, and standard deviation together help sketch a customer’s normal behavior profile. The mean gives an overall average, the median provides a robust typical value, and the standard deviation sets a gauge for what counts as a significant deviation from the norm. Next, we’ll see how these metrics are applied in real monitoring scenarios.

Applying These to Transaction Monitoring

Defining a baseline means establishing an expected range for each user’s behavior and then continuously comparing new events against that personal benchmark. Compliance analysts often create a profile like: “This user usually sends $200-$500 per transfer to 2-3 known beneficiaries per week.” With such a baseline, the system can then automatically flag transactions that fall outside this expected pattern.

In practice, this might involve rules such as: if a transaction amount exceeds X times the user’s average (mean) or if it falls outside 2 standard deviations of their normal amount range. Similarly, frequency-based checks can be used. For example, if a user typically makes around 5 transactions a week (with a certain std dev), suddenly making 20 in a day would be an anomaly. These techniques essentially enable dynamic thresholds that adjust to each customer. Rather than a static rule like “alert on any transfer above $10,000,” the threshold for each user can be proportional to their own historical behavior (e.g., “alert if amount is 3× more than this user’s 90-day average”). According to industry best practices, layering in such user-specific behavior signals helps surface risks that wouldn’t be visible under one-size-fits-all rules. By comparing each transaction to the customer’s own baseline, their own median amount, average frequency, usual payees, etc. institutions can stay responsive as behavior shifts over time.

Let’s consider two examples of how these baseline metrics help in detection:

AML Red Flag Example: Suppose an elderly customer usually makes one local transfer of around $300 every month to pay a utility bill. This month, however, they attempt an international wire of $20,000 to an overseas account. This transaction is way outside the customer’s normal range, it might be 10-20 times higher than their average amount and a completely new destination. Such a spike would immediately stand out: it exceeds the user’s typical amount by well over 3 standard deviations, and it doesn’t match their usual pattern of domestic payments. The monitoring system would flag this for review as a potential suspicious transaction (it could indicate the account was taken over by criminals or the customer is suddenly involved in high-risk activity). In fact, many classic money laundering red flags are exactly this, transactions inconsistent with the customer’s known profile. By defining “expected behavior” up front (e.g. normal amount and geography) and alerting on deviations, the institution sharply increases its chances of catching illicit activity like this in real time rather than after the fact.
Fraud Red Flag Example: Consider a user who typically logs in from Berlin and makes at most 2-3 small transactions a week via their digital wallet. If suddenly there’s a login from a new device in another country, followed by a rapid burst of 15 transfer attempts in one day, that’s a major anomaly. Here, the unusual pattern is a combination of factors: a new login location (never seen before) plus a spike in transaction volume far above the user’s normal weekly count. A rule might catch this by saying “if number of transactions in 24 hours > 5x (user’s average daily transactions) and originating device is new, then flag.” In essence, the user’s expected velocity (transactions per day/week) has been shattered. This is likely indicative of account takeover fraud, the fraudster is trying to drain the account quickly. By comparing the activity against the customer’s baseline (low and steady) and noticing a multi-standard-deviation surge in frequency, the system can flag the session for immediate investigation. The compliance team could then intervene or freeze transactions before serious damage is done.

In both examples, the key is context. A $20,000 wire or 15 transactions in a day might not be alarming for some high-net-worth or business customers at all. What makes it suspicious is that it’s out-of-character for that specific user. By establishing each user’s typical behavior (through metrics like median amounts, average frequencies, etc.) and monitoring deviations, banks and fintechs dramatically improve detection accuracy. They can catch truly suspicious outliers while ignoring irrelevant noise. This balance reduces false positives (alerts for routine activity that just looks large in a generic sense) and false negatives (missed true threats).

Conclusion: Know the Baseline, Catch the Anomaly

In the fight against financial crime, one of the most powerful questions a compliance team can ask is, “Is this behavior expected for this customer?” Establishing a baseline for each user, whether it’s their typical transaction size, normal login pattern, or usual transaction count is essential to answer that question. We’ve seen that different metrics serve different purposes: the median often gives a truer picture of typical transaction value in the presence of outliers, the mean can indicate overall trends, and the standard deviation provides a quantitative yardstick for what’s significantly outside the norm. There is no single metric that covers it all; a combination is often ideal. For example, using the median for transaction amounts (to handle skewed distributions) paired with standard deviation (to gauge variability) can establish a robust expected range. Compliance leads and risk analysts should choose the right tool for each behavior pattern, e.g. median or percentile-based thresholds for transaction amounts, vs. standard deviation or velocity measures for frequency of actions.

It’s also important to remember that expected behavior isn’t static. A customer’s habits can evolve over time (gradually or suddenly), and what was abnormal last year might be routine this year, or vice versa. This is why continuous monitoring and dynamic baselining are so crucial. By constantly recalibrating what “normal” looks like for each user, you ensure that your detection logic stays relevant and effective. A system grounded in expected behavior will catch the truly suspicious anomalies, those needle-in-haystack deviations that signal risk, while gracefully handling the day-to-day fluctuations that reflect genuine customer activity.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

1079

Report

Channels

/regulation & compliance /financial crime

RegTech

Regulatory technology, is a new technology that uses information technology to enhance regulatory processes. With its main application in the Financial sector, it is expanding into any regulated business with a particular appeal for the Consumer Goods Industry. Often regarded as a subcategory under FinTech, RegTech puts a particular emphasis on regulatory monitoring, reporting and compliance and is thus benefiting the finance industry.

Join group

94 opinions 31 members 31 July 2025

Comments: (0)

Joseph Ibitola

Head of Demand Generation

Flagright

Member since

30 Jun

Location

Singapore

More expert opinions

Paul Quickenden Chief Commercial Officer at Easy Crypto

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

Join the Community

23,986

Expert opinions

40,655

Total members

365

New members (last 30 days)

205

New opinions (last 30 days)

29,266

Total comments

Join Sign in

Join the Community

Establishing “Expected Behavior”: Using Median, Standard Deviation and Avg to Detect Suspicious Txns