Join the Community

23,967

Expert opinions

40,637

Total members

363

New members (last 30 days)

198

New opinions (last 30 days)

29,264

Total comments

Join Sign in

How to Avoid GenAI Hallucinations

2 06 August 2025 Be the first to comment

Scott Zoldi

Chief Analytics Officer

FICO

Last year, “hallucinations” produced by generative artificial intelligence (GenAI) were in the spotlight in the courtroom and all over the news. Bloomberg News reported that “Goldman Sachs Group Inc., Citigroup Inc., JPMorgan Chase & Co. and other Wall Street firms are warning investors about new risks from the increasing use of artificial intelligence, including software hallucinations, employee-morale issues, use by cybercriminals and the impact of changing laws globally.”

GenAI hallucinations are indeed problematic. For example, researchers at Stanford University last year found that general-purpose GenAI tools like ChatGPT have an error rate as high as 82% when used for legal purposes. GenAI tools purpose-built for law applications are better, producing hallucinations 17% percent of the time, according to a different Stanford study.

Regardless of the hallucination rate, the problem is further exacerbated, in any industry, by the human consuming the GenAI output: they may not notice the hallucination or care to validate the output, instead acting directly upon it.

Why Do GenAI Models Hallucinate?

Factors that can lead to GenAI hallucinations include:

The type, quality, quantity, and breadth of data used for pre-training matter. Most large language models (LLMs) are ‘universal models’ packed with data and factoids that are irrelevant to the specific problems where the LLM is used.
Low pre-training data coverage is used for key tokens and topics prompted. LLM technology represents words and/or groups of words as tokens and uses sequences of these tokens and statistics about these tokens to produce an answer. If there is insufficient statistical coverage, the LLM may make inferences based on noise rather than clear clean signals supported by strong statistics in training.
Lack of self-restraint in LLM inference in not prohibiting use of low pre-training data coverage examples in responses. The issue stems from most LLMs not considering whether there is sufficient statistical coverage to form their responses, instead assuming the response is statistically sound. Most LLMs do not inspect when there is low coverage to adequately support an answer.
Lack of understanding that record retrieval argumentation generation (RAG) can increase the rate of hallucination by biasing statistics of tokens already learned by the foundational model during its original pre-training. RAG can make statistics locally unreliable in unnatural ways driving up hallucinations and bias.

Detecting hallucinations is difficult because LLM algorithms are not interpretable and do not provide visibility to justify their responses. Even if a RAG context was supposedly referenced in the response, you may find that wasn’t the case. Without knowing the answer, haphazardly relying on bad or biased statistics in LLMs to get a possible answer can be high risk.

How Can You Reduce GenAI Hallucinations?

Many organizations are already trying to customize pretrained LLMs to their purposes using fine-tuning techniques like Low-Rank Adaptation (LoRA). To reduce hallucinations, one needs to specify the domain and task data used to build large language models that will have less hallucinations as they are trained on data that is relevant to the use case.

There is also a need for additional models to monitor and minimize harm created by hallucinations. Enterprise policy should prioritize the process for how the output of these tools will be used in a business context and leverage a risk-based strategy to decide when and when not to use outputs, and how to set risk tolerance based on use case. Specially designed GenAI trust scores reflect the probability that the prompts and answers align with sanctioned answers. High trust scores mean little risk of hallucinations, low trust scores mean high risk of hallucinations. With a trust score you can set your risk tolerance and control the amount of hallucination and harm to your business while benefiting from the power of generative AI techniques.

Using Focused Language Models to Fight Hallucinations

The best approach to using GenAI responsibly in financial services starts with the concept of focused language models (FLMs). FLMs are small language models (SLMs) built on an expertly designed training data set both at the domain and task level — in other words, data from the context in which the final model will be used, such as risk management decisions in financial services. This results in superior accuracy, enhanced trust in output, and efficiency in production since smaller models result in less inferencing latency and cost.

FLM is a new concept that puts data science back into GenAI, and in a way that meets responsible AI principles. A fine level of specificity ensures the appropriate high quality and high relevance data is chosen; later, you can further train the model (‘task tuning’) to further ensure it’s correctly focused on the specific business objective at hand and that the outputs get operationalized in a business process.

The FLM approach is distinctly different from commercially available LLMs and SLMs, which offer no control of the data used to build the model. For enterprises this control of the pretraining and task training data is crucial for preventing hallucinations and harm. Complete control of this training data is a first necessary step in a Responsible AI use of these transformer models.

A focused language model enables GenAI to be used responsibly because:

It affords transparency and control of appropriate and high-quality data on which a core domain-focused language model is built.
On top of industry domain-focused language models, users can create task-specific focused language models with tight vocabulary and training contexts for the business objective at hand.
Due to the transparency and control of the data, the resulting FLM can be accompanied by a trust score with every response, allowing risk-based operationalization of GenAI.

Curious to learn more? Join me at the 7^th AI in Financial Services conference in London on the 9th of September, where I will be discussing this topic in my presentation.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

1907

Report

Channels

/artificial intelligence /retail banking

Artificial Intelligence

After the successful launch of the Chat GPT 4.0 chatbot by OpenAI at the beginning of 2023, many businesses started testing the tools provided by artificial intelligence and the areas of their application.

Join group

101 opinions 49 members 23 September 2025