SONIC: Spectral Optimization of Noise
for Inpainting with Consistency

Seungyeon Baek¹, Erqun Dong¹, Shadan Namazifard¹, Mark J. Matthews^{2 *}, Kwang Moo Yi¹

¹The University of British Columbia, ²Google DeepMind

^*Participated in an advisory capacity only.

Paper arXiv Code

We propose a novel training-free method of inpainting that focuses exclusively on the initial seed noise. (Top row) We show the denoising result of an initial seed noise, as we optimize the seed noise using our method. We optimize the seed noise to faithfully regenerate the non-masked regions of the input image, so as to obtain more consistent inpainting results. (Bottom row) Inpainting results of competing methods, with our final result on the right.

Overview

Method overview. We optimize the initial seed noise in the spectral domain X_T, starting from a random noise x_T, such that our denoised latent matches the masked observation y in the latent space. To allow partial observations to be encoded, we use nearest-pixel filling before passing it into the encoder. We then compute the masked mean square error in the latent space, comparing it with a fully denoised latent, and update X_T accordingly. Importantly, we linearize the entire T-step denoising process, essentially disconnecting the gradient flow passing through it. This allows us to optimize the initial seed noise X_T without back-propagating through the denoiser.

Spatial vs. Spectral Optimization

We compare optimizing the initial noise in the spatial domain versus our proposed spectral domain optimization. Spatial optimization is less reliable and struggles to converge effectively, while our spectral approach (optimizing in the FFT domain) achieves stable and superior convergence. Below we show the loss curves and corresponding x₀ reconstructions demonstrating this difference.

Qualitative Comparisons

For additional qualitative comparisons, please refer to the Supplementary Materials.

FFHQ

DIV2K

BrushBench

Additional Applications

Our initial noise optimization method can also be applied to text-to-video models for video inpainting tasks. Below, we compare the output from our method prepended to Wan2.1^[1] against ProPainter^[2], a video inpainting baseline. To compare the results, please move your cursor left and right.

Masked Ground Truth

Loading video

Inpainted Output

Prompts

To facilitate reproduction, we provide prompts used for all experiments. We provide the same prompts to all methods.

FFHQ Prompts

DIV2K Prompts

SONIC: Spectral Optimization of Noise
for Inpainting with Consistency

Abstract

Overview

Optimizing the Initial Seed Noise

Spatial vs. Spectral Optimization

Qualitative Comparisons

Additional Applications

Prompts

SONIC: Spectral Optimization of Noise for Inpainting with Consistency

Abstract

Overview

Optimizing the Initial Seed Noise

Spatial vs. Spectral Optimization

Qualitative Comparisons

Additional Applications

Prompts

SONIC: Spectral Optimization of Noise
for Inpainting with Consistency