Title Open data provenance and reproducibility: a case study from publishing CMS open data /
Authors Simko, Tibor ; de Bittencourt, Heitor Pascoal ; Carrera, Edgar ; Delgado Lopez, Diyaselis ; Lange, Clemens ; Lassila-Perini, Kati ; Lintuluoto, Adelina ; Lloret Iglesias, Lara ; McCauley, Thomas ; Okraska, Jan ; Prelipcean, Daniel ; Savaniakas, Mantas
DOI 10.1051/epjconf/202024508014
Full Text Download
Is Part of 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), University Adelaide, Adelaide, Australia, November 04-08, 2019.. EDP Sciences. 2020, art. no. 08014, p. [1-8]
Abstract [eng] In this paper we present the latest CMS open data release published on the CERN Open Data portal. Samples of collision and simulated datasets were released together with detailed information about the data provenance. The associated data production chains cover the necessary computing environments, the configuration files and the computational procedures used in each data production step. We describe data curation techniques used to obtain and publish the data provenance information and we study the possibility of reproducing parts of the released data using the publicly available information. The present work demonstrates the usefulness of releasing selected samples of raw and primary data in order to fully ensure the completeness of information about the data production chain for the attention of general data scientists and other non-specialists interested in using particle physics data for education or research purposes.
Published EDP Sciences
Type Conference paper
Language English
Publication date 2020
CC license CC license description