05/02/2025 | Press release | Distributed by Public on 05/02/2025 10:54
Last year, "hallucinations" produced by generative artificial intelligence (Generative AI [GenAI]) were in the spotlight in court, in court again, and certainly, all over the news. More recently, Bloomberg News said that in their 2024 annual reports, "Goldman Sachs Group Inc., Citigroup Inc., JPMorgan Chase & Co. and other Wall Street firms are warning investors about new risks from the increasing use of artificial intelligence, including software hallucinations, employee-morale issues, use by cybercriminals and the impact of changing laws globally."
Meanwhile, Michael Barr, who recently departed as the U.S. Federal Reserve Bank's vice chair for supervision, foreshadowed these concerns in extemporaneous remarks he made in February at the Council on Foreign Relations. There he said that competitive pressure around incorporating generative artificial intelligence could heighten risks in financial services. Competitive pressure "may push all institutions, including regulated institutions, to take a more aggressive approach to genAI adoption," increasing governance, alignment, and financial risks around AI, Barr said.
I couldn't agree more. That's why we at FICO have always advocated for operationalizing GenAI responsibly, using solutions like focused language models (FLMs), and focused task models to thwart hallucinations before they occur. In this blog I'll provide more background on GenAI hallucinations, and talk about these focused language models, FICO's GenAI solution to help ensure that the "golden age of AI" remains bright.
Hallucinations Are No Illusion
GenAI hallucinations are indeed problematic. For example, researchers at Stanford University last year found that general-purpose GenAI tools like ChatGPT have an error rate as high as 82% when used for legal purposes. GenAI tools purpose-built for law applications are better, producing hallucinations 17% percent of the time, according to a different Stanford study, and shouldn't be used without close, and time-consuming scrutiny.
Regardless of the hallucination rate, the problem is further exacerbated, in any industry, by the human consuming the GenAI output: they may not notice the hallucination or validate the output, instead acting directly upon it.
The Fuel That Stokes the Fire
Factors that can lead to GenAI hallucinations include:
Hallucinations Are Hard to See
Detecting hallucinations is difficult because LLM algorithms are usually not interpretable and do not provide visibility to justify their responses. Even if a Retrieval Augmented Generation (RAG) context was referenced in the response, you may find through human inspection that it was not actually used in the response.
As I explained to journalist John Edwards for Information Week :
The best way to minimize hallucinations is by building your own pre-trained fundamental generative AI model, advises Scott Zoldi, chief analytics officer at analytics software company fico he notes via e-mail that many organizations are now already using, or planning to use, this approach utilizing focused-domain and task-based models. "By doing so, one can have critical control of the data used in pre-training-where most hallucinations arise-and constrain the use of context augmentation to ensure that such use doesn't increase hallucinations but reinforces relationships already in the pre training."
Outside of building your own focused generative models, one needs to minimize harm created by hallucinations, Zoldi says. "[Enterprise] policy should prioritize the process for how the output of these tools will be used in a business context and then validate everything." he suggests.
FLMs Are Focused on Delivering Accurate Answers
FICO's approach to using Generative AI responsibly starts with the concept of small language models (SLMs) which, as the name suggests, are smaller and less complex than LLMs. SLMs are designed to efficiently perform specific language tasks and are built with fewer parameters and often smaller training data. Like LLMs, SLMs are available from multiple providers and come with many of the same challenges as LLMs, although often at reduced risk.
My approach to achieving Responsible GenAI concentrates SLM applications further into a "focused language model" (FLM), a new concept in that SLM development that is focused around smaller but very deliberate data store specific to a very narrow domain or task. A fine level of specificity ensures the appropriate high quality and high relevance data is chosen; later, you can painstakingly tune the model ("task tuning") to further ensure its correctly focused on a task at hand.
The FLM approach is distinctly different from commercially available LLMs and SLMs, which offer no control of the data used to build the model; this capability is crucial for preventing hallucinations and harm. A focused language model enables GenAI to be used responsibly because:
If you want to learn more about how focused language models and trust scores work, and the immense business benefit they can deliver, come to my talk on the FICO World main stage on Thursday, May 8. It's part of the morning General Session; I can't wait to provide proof of just how powerful FLMs are.
See you soon in Hollywood, Florida!