Bots into the Fediverse This dataset contains anonymized features for bot detection on Mastodon (Fediverse). It was created for the accompanying paper and consists of accounts labeled as bot or non-bot, collected from publicly accessible content via the Mastodon Application … Read More
Scientific Publications
Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches
Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the … Read More
Comparing Specialised Small and General Large Language Models on Text Classification
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance When solving NLP tasks with limited labelled data, researchers typically either use a general large language model without further update, or use … Read More
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and timeconsuming job of professional fact-checkers. In this work, we lift several constraints of … Read More
A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of LLM
A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of Large Language Models In the age of social media and generative AI, the ability to automatically assess the credibility of online content has become increasingly critical, … Read More
MuLTa-Telegram: A Fine-Grained Italian and Polish Dataset for Hate Speech and Target Detection
MuLTa-Telegram: A Fine-Grained Italian and Polish Dataset for Hate Speech and Target Detection This paper introduces the MuLTa-Telegram dataset, a Multi- Lingual and multi-Target dataset specifically developed to detect hate speech on Telegram, an understudied yet influential platform in which … Read More
Generative AI and the Threat to Thinking
Generative AI and the Threat to Thinking Information security is concerned with maintaining the integrity of the information ecosystem. The proliferation of content created using generative artificial intelligence can overwhelm the ability of people to process information. Consideration of a … Read More
Activities and Needs of European Fact-checkers as a Basis for Designing Human-Centered AI Systems
Autonomation, Not Automation: Activities and Needs of European Fact-checkers as a Basis for Designing Human-Centered AI Systems To mitigate the negative effects of false information more effectively, the development of Artificial Intelligence (AI) systems to assist fact-checkers is needed. Nevertheless, … Read More
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and … Read More
EuroVerdict: A multilingual dataset for verdict generation against misinformation
EuroVerdict: A multilingual dataset for verdict generation against misinformation Misinformation is a global issue that shapes public discourse, influencing opinions and decision-making across various domains. While automated fact-checking (AFC) has become essential in combating misinformation, most work in multilingual settings … Read More