Face the Facts!
Evaluating RAG-based Pipelines for Professional Fact-Checking
Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and timeconsuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated factchecking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional factchecking practices, RAG-based methods for the generation of verdicts – i.e., short texts discussing the veracity of a claim – evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context adherence, with human evaluations favouring zero-shot and one-shot approaches for informativeness, and fine-tuned models for emotional alignment.
For more information: https://aclanthology.org/2025.inlg-main.50.pdf