Title Mapping claims to evidence in ageing research: an automated NLP pipeline for claim-level literature synthesis
Authors Stučinskas, Arnas ; Jakaitienė, Audronė
DOI 10.15388/LMITT.2026.28
Full Text Download
Is Part of Lietuvos magistrantų informatikos ir IT tyrimai: konferencijos darbai, 2026 m. gegužės 6 d. Vilnius.. Vilnius : Vilniaus universiteto leidykla. 2026, p. 273-280.. eISSN 2783-784X
Keywords [eng] natural language processing ; evidence extraction ; large language models ; biomedical text mining ; longevity research ; evidence synthesis
Abstract [eng] Longevity research spans tens of thousands of clinical and observational publications, yet no systematic, claim-level, quality-graded synthesis of the human literature exists. We present an end-to-end natural language processing pipeline that retrieves, screens, structures, normalises, validates, and quality-grades evidence claims from PubMed at scale. The pipeline uses a local large language model (LLM) for relevance screening and record splitting, and frontier LLMs for structured extraction, entity filtering, taxonomy normalisation, polarity correction, claim validation, and hallmark mapping. Applied to 108,431 retrieved records, the pipeline produced a final dataset of 2,987 quality-graded claims from 1,797 publications, merged into 2,641 factor–outcome claim pairs. The results reveal a broad but shallow evidence landscape: exercise and physical training account for 33.8% of the final corpus, only 1.1% of claims target direct survival or longevity outcomes, and 91.9% of claim pairs are supported by a single study. The main contribution is a modular, updatable NLP system for large-scale claim-level evidence synthesis, together with a public database available at longevityevidence.org.
Published Vilnius : Vilniaus universiteto leidykla
Type Conference paper
Language English
Publication date 2026
CC license CC license description