The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

1University of Chicago, 2Lawrence Berkeley National Laboratory, 3University of Stuttgart, 4Dartmouth College, 5International Computer Science Institute, 6University of California, Berkeley,
Evaluating multi-resolution inference using different training strategies.

Assessing multi-resolution inference. Column 1: Expected prediction for Darcy flow at varying resolutions. Columns 2-6: Sample prediction for Darcy flow at varying test resolutions. Column 7: Average mean squared error test loss at each resolution (lower is better). Zero-shot methods: CNO, FNO, Physics Informed and CROP are all zero-shot methods, meaning the model was trained at a specific resolution (16) and evaluated at resolutions 16, 32, 64, 128. Data-driven method: Multi-resolution training (proposed); notice that multi-resolution training consistently outperforms zero-shot methods.

Abstract

A core challenge in scientific machine learning, and scientific computing more generally, is modeling continuous phenomena which (in practice) are represented discretely. Machine-learned operators (MLOs) have been introduced as a means to achieve this modeling goal, as this class of architecture can perform inference at arbitrary resolution. In this work, we evaluate whether this architectural innovation is sufficient to perform “zero-shot super-resolution,” namely to enable a model to serve inference on higher-resolution data than that on which it was originally trained. We comprehensively evaluate both zero-shot sub-resolution and super-resolution (i.e., multi-resolution) inference in MLOs. We decouple multi-resolution inference into two key behaviors: 1) extrapolation to varying frequency information; and 2) interpolating across varying resolutions. We empirically demonstrate that MLOs fail to do both of these tasks in a zero-shot manner. Consequently, we find MLOs are not able to perform accurate inference at resolutions different from those on which they were trained, and instead they are brittle and susceptible to aliasing. To address these failure modes, we propose a simple, computationally-efficient, and data-driven multi-resolution training protocol that overcomes aliasing and that provides robust multi-resolution generalization.

What are Machine Learned Operators?

Modeling physical systems governed by partial differential equations (PDEs) is critical to many computational science workflows. Central to this problem formulation is that continuous physical systems must be sampled and, therefore, modeled discretely. For discrete models to be useful in representing phenomena of different scales, scientists require the ability to use it at different resolutions accurately. For example, when modeling fluid flow, scientists often use adaptive mesh refinement, a technique that increases simulation resolution in areas that require high accuracy (e.g., regions of turbulence), and coarsens the resolution in less critical regions.

Machine-learned operators (MLOs) are a class of data-driven machine learning (ML) models which parameterize the solution operator to families of PDEs, and can be used to do inference at arbitrary discretizations. Although querying MLOs at arbitrary discretization is computationally inexpensive, it is not obvious that this can be done accurately.

Challenge: Discretely Modeling Continuous Systems

The fundamental challenge in discretely representing continuous systems lies in the choice of sampling rate. The Whittaker–Nyquist–Shannon sampling theorem established that, given a sampling rate r, the largest resolved frequency is r/2. Thus, ML models will be trained on discrete representations, where only some of the frequencies are fully resolved. Resolving higher-order frequencies greater than r/2, consequently, becomes an out-of-distribution task. Aligning these discrete models’ predictions with the underlying continuous system is a non-trivial open problem.

In this work, We assess the ability of trained MLOs to generalize beyond their training resolution. We demonstrate that MLOs struggle to perform accurate inference at resolutions higher than or lower than the resolution on which they were trained, and instead exhibit a form of aliasing. Within an ML context, when inferring at different discretizations of a given signal, aliasing can manifest as the divergence between the energy spectrum of the model prediction and the expected output. Aliasing indicates a model’s failure to fit the underlying continuous system.

For example, in the figure below, we train an FNO at resolution 16 (indicated by *) and evaluate it at resolutions 16, 32, 64, 128. Top Row: Sample prediction for Darcy flow; notice striation artifacts at resolution 128. Middle Row: Average test set 2D energy spectrum of label and model prediction. Bottom Row: Average residual spectrum normalized by label spectrum.

Aliasing.

Breaking Down Multi-Resolution Inference

Signal Processing Overview.

We define multi-resolution inference as the ability to do inference at multiple resolutions (e.g., sub-resolution and super-resolution). The zero-shot multi-resolution task employs an ML model, which is trained on data with some resolution (subfigure (a)), and then tested on data with a different resolution (subfigure (d)). Zero-shot multi-resolution inference raises two important questions with respect to the generalization abilities of trained models:

  1. Resolution Interpolation (subfigure (b).) How do models behave when the frequency information in the data remains fixed, but its sampling rate changes from training to inference? Can the model interpolate the fully-resolved signal to varying resolutions?
  2. Information Extrapolation (subfigure (c).) How do models behave when the resolution remains fixed, but the number of fully resolved frequency components changes from training to inference? For super-resolution, this means can the model extrapolate beyond the frequencies in its training data and model higher frequency information?
We find, via experiment, that MLOs are not able to interpolate or extrapolate in a zero-shot manner, and instead are suseptible to aliasing (subfigure (e).)

Zero-shot Multi-Resolution or Aliasing?

We study whether FNOs are capable of spatial multi-resolution inference: simultaneously changing the sampling rate and frequency information. We train an FNO on a Navier-Stokes dataset at resolution 255 (indicated by *) and evaluate it at resolution 510, 255, 128. Results are in the figure below. Top Row: Ground truth, prediction at resolutions 510 (super-resolution), resolution 255 (same as train resolution), and resolution 128 (sub-resolution). Bottom Row: Average energy spectra over test data.

Aliasing rather than zero-shot super and sub resolution.

We observe a failure to generalize, and instead observe high-frequency artifacts in the model predictions in multi-resolution settings. Therefore, we conclude that FNOs are not capable of consistent zero-shot super- or sub-resolution.

Physical Optimization Constraints

We study if physics-informed optimization constraints are capable of enabling zero-shot multi-resolution inference. We optimize each set of model parameters θ with a dual optimization objective L(θ) = (1 − w) ∗ ℓdata(θ) + w ∗ ℓphys(θ), where ℓdata is the original data-driven loss (mean squared error), and where ℓphys is an additional physics-informed loss, which explicitly enforces that the governing PDE is satisfied. First, we find that the data-driven loss always outperforms any training objective that includes a physics constraint.

PINN weighting coefficients.

To further illustrate this, we use the smallest non-zero w = 0.1, to train a model at resolution 256 (indicated by *) and compare the physics-informed optimization with solely data-driven optimization. In the following figure, we find that the predicted spectra of data from models optimized with a physics loss generally diverge more substantially across test resolutions than models optimized with only a data loss. Models optimized with physics constraints even fail to accurately fit their training distributions (subfigure (c)), and they fail to generalize to both super- and sub-resolution data (subfigure (a,b,d)).

PINN vs Data Driven optimization.

We conclude that physics informed constraints do not reliably enable multi-resolution generalization.

Band-limited Learning

We study two approaches which propose learning band-limited representations of data (i.e., convolutional neural operators (CNO), Cross-Resolution Operator-Learning (CROP)) and assess if they are capable of enabling zero-shot multi-resolution inference. We do an experiment where we train CNO and CROP models on resolution 16 data (indicated by *) and evaluate them at resolutions 16, 32, 64, 128. We visualize the average test 2D energy spectra of model predictions and ground truth in the figure below. We notice that the predicted spectra from both CROP and CNO diverge from the ground truth after frequency 8.

Band-limited learning.

We note more broadly that the design of band-limiting a model’s training data and predictive capacity is counter-intuitive to the goal of multi-resolution inference, in which, a broad range of frequencies must be modeled accurately. Band-limiting a model’s predictive capacity may enable accurate fixed-resolution representations, but it ensures that high-frequency information is not predicted accurately (or at all). We conclude that band-limited learning limits a model’s utility for multi-resolution inference.

Proposed: Multi-Resolution Training

We hypothesize that the reason models struggle to do zero-shot multi-resolution inference is because data representing a physical system at varying resolutions is sufficiently out-of-distribution to the model’s fixed-resolution training data. To remedy this, we propose a data-driven solution: multi-resolution training (i.e., training on more than one resolution).

We do an experiment where we train an FNO on multi-resolution data. The results are shown in the figure below. The bottom row shows the test loss across different resolutions. The middle row shows the ratios of training data within each resolution bucket. The top row shows the the average number of pixels in a data sample in the training set; lower number of pixels enables faster data generation and model training.

Multi-Resolution Training.

We first study what happens when you include two resolution in the training data: In subfigures (a-f), we observe for pair-wise training, the test performance for data that corresponds to the two training resolutions is generally better, but we also note that there are not consistent gains for the two non-training resolutions. This indicates that models perform best on the data resolutions on which they are trained.

To improve multi-resolution capabilities, we investigate the impact of including data from all resolutions. We first assess an equal number of samples across resolutions. In subfigure (g), the test performance across all resolutions improves, which confirms that multi-resolution training benefits multi-resolution inference. Next, we ask: Can we improve the computationally efficiency of multi-resolution training? To do this, the training dataset must be composed of primarily low resolution data, as it is both the cheapest to generate and the cheapest to train over. We compose two additional multi-resolution datasets and observe in subfigures (h, i) that models remain competitive across test resolutions, even as we decrease the amount of high-resolution data.

Conclusion

For MLOs to be as versatile as numerical methods-based approaches for modeling PDEs, they must be able to perform accurate multi-resolution inference. To better understand an MLO’s abilities, we break down the task of multi-resolution inference by assessing a trained model’s ability both to extrapolate to higher/lower frequency information in data and to interpolate across varying data resolutions. We find that models trained on low resolution data and used for inference on high-resolution data can neither extrapolate nor interpolate; and, more generally, they fail to achieve accurate multi-resolution inference. Changing the resolution of data at inference time is akin to out-of-distribution inference: models have not learned how to generalize in such settings. We document that models used in a zero-shot multi-resolution setting are prone to aliasing. We study the utility of two existing solutions–physics-informed constrains and learning band-limited learning–and find that neither enable accurate multi-resolution inference.

Based on these results, we introduce a simple, intuitive, and principled approach to enable accurate multi-resolution inference: multi-resolution training. We show that models perform best at resolutions on which they have been trained; and we demonstrate that one can computationally efficiently achieve the benefits of multi-resolution training via datasets composed with mostly low-resolution data and small amounts of high-resolution data. This enables accurate multi-resolution learning, with the added benefit of low data-generation and model training cost. A promising future direction remains the automated selection of multi-resolution training data using strategies like active learning.

Further details about all experiments and figures discussed in this blog can be found in the main paper. If there are any questions feel free to email the first author for clarification.

BibTeX


@article{sakarvadia2025false,
      title={The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators}, 
      author={Mansi Sakarvadia and Kareem Hegazy and Amin Totounferoush and Kyle Chard and Yaoqing Yang and Ian Foster and Michael W. Mahoney},
      year={2025},
      eprint={2510.06646},
      url={https://arxiv.org/abs/2510.06646}, 
}