Title On the generation of synthetic invoices for training machine learning models /
Authors Gricius, Rolandas ; Belovas, Igoris
DOI 10.1109/ACCESS.2025.3555155
Full Text Download
Is Part of IEEE Access.. Piscataway, NJ : IEEE. 2025, vol. 13, p. 62798-62806.. eISSN 2169-3536
Keywords [eng] machine learning ; entity recognition ; financial documents ; dataset generation
Abstract [eng] Currently, the problem of generating synthetic financial documents is particularly acute. Extending recent research on the topic, we present an enhanced tool for invoice generation. The primary motivation is the need for invoice corpora for machine learning in accounting automation. The generation produces synthetic invoices with randomized layouts and contents. As content fields are generated, annotations for supervised machine learning are saved along with the generated invoice, thus solving the problem of labor-intensive annotation tasks. The content and layout diversity is evaluated and compared to empirical and synthetic invoice corpora using SELF-BLEU, Alignment, and Overlap metrics. We have validated the stability of the modeling statistically. The modeling is consistent and reproducible. The final assessment is that the diversity of the generated invoices is on par with the real-world ones and, by most metrics, exhibits superiority over the foregoing ones.
Published Piscataway, NJ : IEEE
Type Journal article
Language English
Publication date 2025
CC license CC license description