Data-Driven Stochastic Closure Modeling via Conditional Diffusion Model and Neural Operator

We propose a novel framework that leverages state-of-the-art machine learning techniques to construct stochastic closure models. Our approach combines conditional diffusion models with the Fourier Neural Operator (FNO) to capture the missing dynamics in complex systems governed by partial differential equations (PDEs).

Introduction

Complex dynamical systems鈥攆rom turbulent atmospheric flows to ocean circulation and materials under stress鈥攕pan an enormous range of spatial and temporal scales. Directly resolving every scale in numerical simulation is computationally prohibitive, and will remain so for the foreseeable future. Classical closure models address this challenge by parameterizing the effects of unresolved (subgrid) scales on the resolved ones. However, traditional closure models are typically deterministic and local in character, assumptions that break down precisely in the most challenging regimes where the resolved and unresolved scales interact in a highly nonlinear, non-local, and non-Markovian manner.

This work introduces a data-driven framework for constructing stochastic and non-local closure models by marrying two powerful and complementary advances in machine learning: score-based conditional diffusion models and Fourier Neural Operators (FNOs). The resulting framework can learn the full conditional probability distribution of the unknown closure term鈥攏aturally capturing uncertainty鈥攚hile remaining resolution-invariant and capable of modeling long-range spatial interactions. To make the approach practical in numerical simulations, we also demonstrate an accelerated sampling strategy that reduces the inference cost by up to two orders of magnitude.

Closure in Complex Dynamical Systems

Consider a dynamical system whose general governing equation is:

\[\frac{\partial v}{\partial t} = M(v),\]

where $v$ represents the full system state and $M$ encapsulates the dynamics derived from first principles鈥攑otentially involving highly nonlinear mappings, differential operators, and integral operators. In practice, numerically resolving every detail of $v$ is infeasible, motivating the use of a reduced-order system:

\[\frac{\partial V}{\partial t} = \widetilde{M}(V),\]

where $V := \mathcal{K}(v)$ is a reduced-order representation of $v$, and $\widetilde{M}(V)$ approximates the reduced dynamics $\mathcal{K} \circ M(v)$. The inevitable mismatch between $\widetilde{M}(V)$ and the true dynamics gives rise to a closure term $U$, so that

\[\frac{\partial V}{\partial t} = \widetilde{M}(V) + U,\]

where $U$ corrects for the influence of unresolved scales. When the system lacks a clear scale separation, $U$ can depend on the current state $V$ in a nonlinear, non-local fashion and may exhibit memory effects due to non-Markovian dynamics. These features render classical deterministic, local closures insufficient, and motivate a stochastic, data-driven approach.

Score-Based Generative Model

At the heart of this framework lies the variance-exploding stochastic differential equation (VE-SDE) . The forward process progressively perturbs a clean data sample $U_0 \sim p_\text{data}$ by injecting Gaussian noise:

\[dU = \sqrt{\frac{d[\sigma^2(t)]}{dt}}\, dW,\]

where $\sigma(t)$ is a noise schedule that grows from nearly zero to a large value, and $W$ is a standard Wiener process. After running this process until time $T$, the data is fully corrupted into near-Gaussian noise with variance $\sigma^2(T) \gg 1$.

Generation then proceeds by reversing this diffusion. The exact reverse SDE is

\[dU = -\frac{d[\sigma^2(t)]}{dt}\, \nabla_U \log p_t(U)\, dt + \sqrt{\frac{d[\sigma^2(t)]}{dt}}\, d\bar{W},\]

where $\nabla_U \log p_t(U)$ is the score function — the gradient of the log density at noise level $t$ — and $\bar{W}$ is a reverse-time Wiener process. Starting from Gaussian noise and integrating this reverse SDE produces samples from $p_\text{data}$.

The score function is approximated by a neural network $s_\theta(U, t)$ trained with the denoising score matching objective:

\[\mathcal{L}(\theta) = \mathbb{E}_{t, U_0, \epsilon}\!\left[\lambda(t)\,\bigl\lVert s_\theta(U_t, t) - \nabla_{U_t}\log p_{0t}(U_t \mid U_0)\bigr\rVert^2\right],\]

where $\epsilon \sim \mathcal{N}(0, I)$, $U_t = U_0 + \sigma(t)\epsilon$, and $\lambda(t)$ is a weighting function. Because the transition kernel gradient simplifies to $\nabla_{U_t}\log p_{0t}(U_t\mid U_0) = -(U_t - U_0)/\sigma^2(t)$, training reduces to predicting the noise direction from a corrupted sample.

Conditional Diffusion Model Framework

Closure terms do not live in a vacuum: the term $U$ at time $t$ typically depends on the resolved state $V$ and its recent history. We collect all such side information into a conditioning vector $\mathrm{y}$ (e.g., current vorticity field, sparse observations of $U$) and train a conditional score network $s_\theta(U_t, t, \mathrm{y})$.

The conditional training objective is

\[\mathcal{L}(\theta) = \mathbb{E}_{t, U_0, \epsilon}\!\left[\lambda(t)\,\bigl\lVert s_\theta(U_t, t, \mathrm{y}) + \frac{\epsilon}{\sigma(t)}\bigr\rVert^2\right].\]

An important implementation insight is that the diffusion only needs to run up to a finite terminal time $T < \infty$ such that $\sigma(T) \gg \sigma_\text{data}$, rather than $T \to \infty$. Smaller $T$ shortens the reverse trajectory and thus speeds up sampling, which is essential for online use inside a simulation loop.

Figure 1. Schematic of the stochastic closure modeling workflow. At each simulation time step, the conditioning information $\mathrm{y}$ (resolved vorticity field and sparse closure observations) is fed into the conditional score network. The reverse diffusion then generates a stochastic closure term $U$, which is injected back into the PDE solver. The FNO backbone ensures resolution invariance throughout.

Leveraging Fourier Neural Operators

The score network $s_\theta$ must map continuous spatiotemporal fields — not fixed-size images — to scalar scores. This motivates replacing conventional CNNs with a Fourier Neural Operator (FNO) . Each FNO layer transforms an input function $v$ via a combination of a global Fourier convolution and a local linear operator:

\[\bigl(\mathcal{F}[v]\bigr)(\xi) = \sigma\!\left(W v(\xi) + \mathcal{F}^{-1}\!\left[R_\phi \cdot \mathcal{F}[v]\right](\xi)\right),\]

where $W$ is a pointwise linear operator, $R_\phi$ are learnable complex weight matrices acting on the low-frequency Fourier modes, and $\sigma$ is a nonlinearity. The key advantages for closure modeling are:

Resolution invariance. Because FNO operates in frequency space, a model trained on a $64\times64$ grid can be directly evaluated at $128\times128$ or $256\times256$ without retraining — critical for multi-resolution production simulations.
Non-local receptive field. A single Fourier layer already has a global receptive field, capturing long-range correlations that local convolutional filters miss. This is essential when closure depends on the entire vorticity pattern rather than a small patch.

In practice, the architecture uses three parallel pipelines for conditioning: (i) the noisy closure sample $U_t$, (ii) the full vorticity field $V$, and (iii) sparse observations of the true closure term — all passed through separate FNO encoders before being fused with the noise level embedding $t$.

Fast Sampling Strategies

Vanilla diffusion samplers require hundreds of steps, making them too slow for per-timestep use inside a simulation. The paper introduces two complementary accelerations:

Adaptive time-step schedule. Rather than uniform steps, the reverse SDE uses the noise-level schedule

\[\sigma_i = \left(\sigma_\text{max}^{1/\rho} + \frac{i}{N-1}\left(\sigma_\text{min}^{1/\rho} - \sigma_\text{max}^{1/\rho}\right)\right)^\rho, \quad i = 0, 1, \ldots, N-1,\]

with $\rho = 7$. This geometric-like spacing concentrates steps near $\sigma_\text{min}$, where the score changes most rapidly, yielding better accuracy per step.

Reduced terminal time $T$. Because the closure distribution $p(U)$ has a much smaller variance than a standard image dataset, the forward process reaches “pure noise” much sooner. Exploiting this with a smaller $T$ (and correspondingly smaller $\sigma_\text{max}$) reduces the number of reverse steps needed by roughly an order of magnitude, enabling sub-10-step sampling without quality loss.

Interactive Demo: From Noise to Flow

The animation below illustrates the core idea of the reverse diffusion process. At $t = T$ (slider all the way left) the field is pure Gaussian noise. Dragging the slider right “denoises” it, and structured vortex patterns — reminiscent of a 2-D turbulent flow — gradually emerge.

Pure noise (t = T) Coherent flow (t = 0)

Drag the slider to watch the reverse diffusion process denoise a random field into a structured 2-D vorticity pattern.

Numerical Experiments: 2D Turbulence

The framework is validated on the 2-D Navier–Stokes equations in vorticity form:

\[\frac{\partial \omega}{\partial t} + u \cdot \nabla\omega = \nu\Delta\omega + f,\]

where $\omega = \nabla \times u$ is the vorticity, $\nu = 10^{-3}$ is the kinematic viscosity, and $f$ is a fixed external forcing. The computational domain is $[0, 2\pi]^2$ with periodic boundary conditions. A fine-grid ($256\times256$) pseudo-spectral solver provides ground-truth trajectories; coarse-grid ($64\times64$) models must compensate via stochastic closure terms.

Two physically distinct closure scenarios are studied:

Scenario G — Stochastic Viscous Diffusion (Linear, Local)

The closure term $G$ represents a spatially varying, stochastic viscous diffusion correction. Because it is linear in $\omega$ and spatially local, a relatively simple conditioning strategy works well. Conditioning on only the vorticity field yields a Frobenius-norm relative error of $\sim 30\%$; adding sparse observations of $G$ (just $5^2 = 25$ grid points) reduces this to 8.2%.

Scenario H — Stochastic Convection Term (Nonlinear, Non-local)

The convection closure $H$ is nonlinear and non-local — it depends on global vorticity patterns. This makes it significantly harder to model. The best conditional model (vorticity + sparse $H$ observations) achieves a relative error of 12.7%, compared to $\sim 45\%$ without any closure. The FNO’s global receptive field is critical here: purely local architectures fail to capture the long-range nature of $H$.

Resolution Invariance

A defining feature of FNO-based models is that they can be queried at any resolution after training at $64\times64$. The table below summarizes the Frobenius-norm relative errors $D_\text{Fro}$ across resolutions for Scenario G:

Training res.	Test res.	$D_\text{Fro}$
$64\times64$	$64\times64$	8.2%
$64\times64$	$128\times128$	9.1%
$64\times64$	$256\times256$	9.8%

Performance degrades gracefully — less than 2 percentage points across a 4× increase in linear resolution — confirming that no retraining is needed when the simulation grid is refined.

The energy spectra of generated closure fields likewise match the ground truth across all tested resolutions: the expected $k^{-3}$ enstrophy cascade is preserved, and no spurious energy pile-up appears at the grid cutoff frequency.