Generative AI

Self-Refining Video Sampling

SSangwon JangTTaekyung KiJJaehyeong JoSSaining XieJJaehong YoonSSung Ju Hwang
Published
January 26, 2026
Authors
6
Word Count
14,529
Code
Includes code

Enhancing video generation with self-refinement.

Abstract

Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment, achieving over 70\% human preference compared to the default sampler and guidance-based sampler.

Key Takeaways

  • 1

    Self-refining approach improves video generation quality.

  • 2

    Iterative noising and denoising enhance motion coherence.

  • 3

    Uncertainty-aware strategy preserves visual integrity.

Limitations

  • Relies on generator's inherent capabilities.

  • Increased computational cost with more iterations.

Keywords

video generatorsdenoising autoencoderiterative inner-loop refinementself-refining video samplinguncertainty-aware refinementmotion coherencephysics alignment

More in Generative AI

View all
Self-Refining Video Sampling | Paperchime