Abstract [eng] |
We develop a fully automated, modular workflow for data-driven discovery of governing partial differential equations in reaction–diffusion systems. The pipeline combines \emph{Sparse Identification of Non-linear Dynamics} (SINDy), exhaustive hyper-parameter sweeps, and a two-stage \emph{term-stability pruning} strategy that removes rarely selected or chronically over-fitting features before Pareto-optimal model selection. The framework is benchmarked on four canonical one-dimensional PDEs—the heat, Fisher–KPP, Allen–Cahn and Gray–Scott equations—using synthetic data on meshes ranging from \(16\times16\) to \(256\times256\) points. On noise-free data every equation is recovered with \(F_1\!\approx\!1.0\) and relative coefficient errors below \(5\,\%\). Experiments with additive Gaussian noise reveal a decisive vulnerability of finite-difference derivatives: identification deteriorates beyond \(\sim1\%\) noise for single-species and \(\sim0.1\%\) for multi-species systems. Nevertheless, the proposed pruning cuts the candidate library by roughly 50 \% while retaining all true terms, lowering residuals by up to 70 \% and enabling correct Gray–Scott identification on a \(64\times64\) mesh—four times coarser than previously reported. All code is released as open source and supports drop-in optimisers and libraries. The study clarifies when classical (strong-form) SINDy is reliable, quantifies its noise limits, and provides practical guidelines on library design and hyper-parameter tuning. Future work should integrate weak-form and Bayesian sparsification techniques and explore automated library construction to extend the approach to high-dimensional, experimentally noisy data. |