Fair Isaac Corporation

05/02/2025 | Press release | Distributed by Public on 05/02/2025 10:54

Worried about Gen AI Hallucinations? Using Focused Language Models is an Imaginative—and Proven—Solution

Last year, "hallucinations" produced by generative artificial intelligence (Generative AI [GenAI]) were in the spotlight in court, in court again, and certainly, all over the news. More recently, Bloomberg News said that in their 2024 annual reports, "Goldman Sachs Group Inc., Citigroup Inc., JPMorgan Chase & Co. and other Wall Street firms are warning investors about new risks from the increasing use of artificial intelligence, including software hallucinations, employee-morale issues, use by cybercriminals and the impact of changing laws globally."

Meanwhile, Michael Barr, who recently departed as the U.S. Federal Reserve Bank's vice chair for supervision, foreshadowed these concerns in extemporaneous remarks he made in February at the Council on Foreign Relations. There he said that competitive pressure around incorporating generative artificial intelligence could heighten risks in financial services. Competitive pressure "may push all institutions, including regulated institutions, to take a more aggressive approach to genAI adoption," increasing governance, alignment, and financial risks around AI, Barr said.

I couldn't agree more. That's why we at FICO have always advocated for operationalizing GenAI responsibly, using solutions like focused language models (FLMs), and focused task models to thwart hallucinations before they occur. In this blog I'll provide more background on GenAI hallucinations, and talk about these focused language models, FICO's GenAI solution to help ensure that the "golden age of AI" remains bright.

Hallucinations Are No Illusion

GenAI hallucinations are indeed problematic. For example, researchers at Stanford University last year found that general-purpose GenAI tools like ChatGPT have an error rate as high as 82% when used for legal purposes. GenAI tools purpose-built for law applications are better, producing hallucinations 17% percent of the time, according to a different Stanford study, and shouldn't be used without close, and time-consuming scrutiny.

Regardless of the hallucination rate, the problem is further exacerbated, in any industry, by the human consuming the GenAI output: they may not notice the hallucination or validate the output, instead acting directly upon it.

The Fuel That Stokes the Fire

Factors that can lead to GenAI hallucinations include:

  • The type, quality, quantity, and breadth of data used for pre-training.
  • Low pre-training data coverage for key tokens and topics prompted This is related to associating words and/or groups of words with statistics associated with a prompt or use in an answer. If there is insufficient coverage, the LLM may make inferences based on noise rather than clear signals supported by strong coverage.
  • Lack of self-restraint in LLM inference in not prohibiting use of low pre-training data coverage examples in responses. The issue stems from most LLMs not considering whether there is sufficient coverage to form their responses, instead assuming the response is statistically sound. Most LLMs do not inspect when there is low coverage to adequately support an answer. Ideally when this situation occurs, the LLM should indicate that it doesn't have enough information to provide a reliable response.
  • Lack of understanding that record retrieval argumentation (RAG) can increase the rate of hallucination by desensitizing or destabilizing relationships learned by the foundational model during its original pre-training. RAG can over-emphasize and change statistics locally in the prompt in unnatural ways.

Hallucinations Are Hard to See

Detecting hallucinations is difficult because LLM algorithms are usually not interpretable and do not provide visibility to justify their responses. Even if a Retrieval Augmented Generation (RAG) context was referenced in the response, you may find through human inspection that it was not actually used in the response.

As I explained to journalist John Edwards for Information Week :

The best way to minimize hallucinations is by building your own pre-trained fundamental generative AI model, advises Scott Zoldi, chief analytics officer at analytics software company fico he notes via e-mail that many organizations are now already using, or planning to use, this approach utilizing focused-domain and task-based models. "By doing so, one can have critical control of the data used in pre-training-where most hallucinations arise-and constrain the use of context augmentation to ensure that such use doesn't increase hallucinations but reinforces relationships already in the pre training."

Outside of building your own focused generative models, one needs to minimize harm created by hallucinations, Zoldi says. "[Enterprise] policy should prioritize the process for how the output of these tools will be used in a business context and then validate everything." he suggests.

FLMs Are Focused on Delivering Accurate Answers

FICO's approach to using Generative AI responsibly starts with the concept of small language models (SLMs) which, as the name suggests, are smaller and less complex than LLMs. SLMs are designed to efficiently perform specific language tasks and are built with fewer parameters and often smaller training data. Like LLMs, SLMs are available from multiple providers and come with many of the same challenges as LLMs, although often at reduced risk.

My approach to achieving Responsible GenAI concentrates SLM applications further into a "focused language model" (FLM), a new concept in that SLM development that is focused around smaller but very deliberate data store specific to a very narrow domain or task. A fine level of specificity ensures the appropriate high quality and high relevance data is chosen; later, you can painstakingly tune the model ("task tuning") to further ensure its correctly focused on a task at hand.

The FLM approach is distinctly different from commercially available LLMs and SLMs, which offer no control of the data used to build the model; this capability is crucial for preventing hallucinations and harm. A focused language model enables GenAI to be used responsibly because:

  • It affords transparency and control of appropriate and high-quality data on which a core domain-focused language model is built.
  • On top of industry domain-focused language models, users can create task-specific focused language models with tight vocabulary and training contexts for the task at hand.
  • Further, due to the transparency and control of the data, the resulting FLM can be accompanied by a trust score with every response, allowing risk-based operationalization of Generative AI; trust scores measure how responses align with the FLM's domain and/or task knowledge anchors (truths).

If you want to learn more about how focused language models and trust scores work, and the immense business benefit they can deliver, come to my talk on the FICO World main stage on Thursday, May 8. It's part of the morning General Session; I can't wait to provide proof of just how powerful FLMs are.

See you soon in Hollywood, Florida!

Fair Isaac Corporation published this content on May 02, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 02, 2025 at 16:54 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io