Show, Don't Tell: Morphing Latent Reasoning into Image Generation

HHarold Haodong ChenXXinxiang YinWWen-Jie ShuHHongfei ZhangZZixin ZhangCChenfei LiaoLLitao GuoQQifeng ChenYYing-Cong Chen

Published: February 2, 2026
Authors: 9
Word Count: 11,235
Code: Includes code

View on arXiv Download PDF

Revolutionizing text-to-image generation with adaptive reasoning.

Abstract

Text-to-image (T2I) generation has achieved remarkable progress, yet existing methods often lack the ability to dynamically reason and refine during generation--a hallmark of human creativity. Current reasoning-augmented paradigms most rely on explicit thought processes, where intermediate reasoning is decoded into discrete text at fixed steps with frequent image decoding and re-encoding, leading to inefficiencies, information loss, and cognitive mismatches. To bridge this gap, we introduce LatentMorph, a novel framework that seamlessly integrates implicit latent reasoning into the T2I generation process. At its core, LatentMorph introduces four lightweight components: (i) a condenser for summarizing intermediate generation states into compact visual memory, (ii) a translator for converting latent thoughts into actionable guidance, (iii) a shaper for dynamically steering next image token predictions, and (iv) an RL-trained invoker for adaptively determining when to invoke reasoning. By performing reasoning entirely in continuous latent spaces, LatentMorph avoids the bottlenecks of explicit reasoning and enables more adaptive self-refinement. Extensive experiments demonstrate that LatentMorph (I) enhances the base model Janus-Pro by 16% on GenEval and 25% on T2I-CompBench; (II) outperforms explicit paradigms (e.g., TwiG) by 15% and 11% on abstract reasoning tasks like WISE and IPV-Txt, (III) while reducing inference time by 44% and token consumption by 51%; and (IV) exhibits 71% cognitive alignment with human intuition on reasoning invocation.

Key Takeaways

1
LatentMorph integrates adaptive reasoning into image generation.
2
Enhances fidelity, abstract reasoning, and efficiency.
3
Mimics human cognitive rhythms in creative processes.

Limitations

Relies on base model quality and RL invoker.
Potentially high computational requirements for training.

Keywords

text-to-image generationlatent reasoningvisual memorylatent thoughtsactionable guidanceimage token predictionsreinforcement learningcognitive alignmentJanus-ProGenEvalT2I-CompBenchWISEIPV-TxtTwiG

More in Generative AI

View all

Helios: Real Real-Time Long Video Generation Model

Shenghai Yuan, Yuanyang Yin +4

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We mak...

Mar 4136

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Yiying Yang, Wei Cheng +6

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON...

Mar 2111

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Zengbin Wang, Xuecai Hu +4

Text-to-image (T2I) models have achieved remarkable success in generating high-fidelity images, but they often fail in handling complex spatial relationships, e.g., spatial perception, reasoning, or i...

Jan 28107

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev +8

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside high...

Jan 558

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Hojung Jung, Rodrigo Hormazabal +6

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the dis...

Feb 1954

More Generative AI papers