Helios: Real Real-Time Long Video Generation Model

SShenghai YuanYYuanyang YinZZongjian LiXXinwei HuangXXiao YangLLi Yuan

Published: March 4, 2026
Authors: 6
Word Count: 18,281

14B model generates minute-scale videos at 19.5 FPS without sacrificing quality or requiring specialized acceleration techniques.

Abstract

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time generation without standard acceleration techniques such as KV-cache, sparse/linear attention, or quantization; and (3) training without parallelism or sharding frameworks, enabling image-diffusion-scale batch sizes while fitting up to four 14B models within 80 GB of GPU memory. Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks. To mitigate drifting in long-video generation, we characterize typical failure modes and propose simple yet effective training strategies that explicitly simulate drifting during training, while eliminating repetitive motion at its source. For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models. Moreover, we introduce infrastructure-level optimizations that accelerate both inference and training while reducing memory consumption. Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation. We plan to release the code, base model, and distilled model to support further development by the community.

Key Takeaways

1
Helios generates minute-long videos at 19.5 FPS on a single GPU using a 14B model while matching larger model quality.
2
The model eliminates drifting through unified history injection and guidance attention without requiring self-forcing or error-banks.
3
Real-time performance achieved through multi-term memory patchification and hierarchical distillation reducing sampling steps from 50 to 3.

Limitations

Method requires high-end hardware like NVIDIA H100 GPU for optimal performance at stated speeds.
Approach still relies on autoregressive generation which may limit theoretical maximum coherence in extremely long sequences.

Keywords

autoregressive diffusion modelvideo generationlong-video driftingself-forcingerror-bankskeyframe samplingKV-cachesparse attentionlinear attentionquantizationunified input representationT2VI2VV2Vtraining strategiesdrifting simulationsampling stepsinfrastructure-level optimizationsmemory consumption

More in Generative AI

View all

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Yiying Yang, Wei Cheng +6

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON...

Mar 2111

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Zengbin Wang, Xuecai Hu +4

Text-to-image (T2I) models have achieved remarkable success in generating high-fidelity images, but they often fail in handling complex spatial relationships, e.g., spatial perception, reasoning, or i...

Jan 28107

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev +8

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside high...

Jan 558

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Hojung Jung, Rodrigo Hormazabal +6

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the dis...

Feb 1954

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Lei Xin, Yuhao Zheng +4

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at...

Feb 2053

More Generative AI papers