Mapping claims to evidence in ageing research: an automated NLP pipeline for claim-level literature synthesis

Arnas Stučinskas; Audronė Jakaitienė

doi:10.15388/LMITT.2026.28

Title	Mapping claims to evidence in ageing research: an automated NLP pipeline for claim-level literature synthesis
Authors	Stučinskas, Arnas ; Jakaitienė, Audronė
DOI	10.15388/LMITT.2026.28
Full Text
Is Part of	Lietuvos magistrantų informatikos ir IT tyrimai: konferencijos darbai, 2026 m. gegužės 6 d. Vilnius.. Vilnius : Vilniaus universiteto leidykla. 2026, p. 273-280.. eISSN 2783-784X
Keywords [eng]	natural language processing ; evidence extraction ; large language models ; biomedical text mining ; longevity research ; evidence synthesis
Abstract [eng]	Longevity research spans tens of thousands of clinical and observational publications, yet no systematic, claim-level, quality-graded synthesis of the human literature exists. We present an end-to-end natural language processing pipeline that retrieves, screens, structures, normalises, validates, and quality-grades evidence claims from PubMed at scale. The pipeline uses a local large language model (LLM) for relevance screening and record splitting, and frontier LLMs for structured extraction, entity filtering, taxonomy normalisation, polarity correction, claim validation, and hallmark mapping. Applied to 108,431 retrieved records, the pipeline produced a final dataset of 2,987 quality-graded claims from 1,797 publications, merged into 2,641 factor–outcome claim pairs. The results reveal a broad but shallow evidence landscape: exercise and physical training account for 33.8% of the final corpus, only 1.1% of claims target direct survival or longevity outcomes, and 91.9% of claim pairs are supported by a single study. The main contribution is a modular, updatable NLP system for large-scale claim-level evidence synthesis, together with a public database available at longevityevidence.org.
Published	Vilnius : Vilniaus universiteto leidykla
Type	Conference paper
Language	English
Publication date	2026
CC license

„Mapping claims to evidence in ageing research: an automated NLP pipeline for claim-level literature synthesis“