AI Infrastructure

Patronus AI launches new open-source ‘hallucination detection’ model

0

Patronus AI has released Lynx, the hallucination detection model designed to address the challenge of hallucinations in large language models (LLMs). According to the company, Lynx represents a breakthrough in the field by enabling real-time hallucination detection without the need for manual annotation.

Patronus AI also open sourced HaluBench, a new benchmark sourced from real-world domains, to assess faithfulness in LLM responses comprehensively.

“Since the release of ChatGPT in November 2022, the proliferation of LLMs has revolutionized text generation and knowledge-intensive tasks like question answering. However, hallucinations, where models produce coherent but inaccurate responses, remains a critical challenge and poses significant risks for enterprises,” said Anand Kannappan, Co-founder and CEO at Patronus AI. “We address this challenge head-on with Lynx, a groundbreaking open source model capable of real-time hallucination detection. Today, we not only introduce the most powerful LLM-as-a-judge with Lynx, we also introduce HaluBench, a novel 15k sample benchmark that LLM developers can use to measure the hallucination rate of their fine-tuned LLMs in domain-specific scenarios.”

As the company puts it, Lynx is the first model that beats GPT-4 on hallucination tasks. Lynx (70B) achieved the highest accuracy at detecting hallucinations, compared to all other LLMs used as judges, making it the largest and most powerful open source hallucination model to date. It outperformed OpenAI’s GPT models and Anthropic’s Claude 3 models at a fraction of the size.

Lynx and HaluBench also support real world domains like Finance and Medicine, which previous datasets and models did not include, making it more applicable to real world problems.

Lynx and HaluBench are now publicly available on Hugging Face, the open source AI platform.

As Cybercrime is on rise, businesses should work closely with Secret Service’s Cyber Fraud Task Force

Previous article

Distributed ledgers, Web3 and Open Source with Hedera’s Dr. Leemon Baird

Next article