Latest Generative AI Research Papers

Research on AI systems that create new content including image generation, text-to-image, video synthesis, and creative AI applications.

76 Papers
Showing 16 of 16 papers

Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory

Dohun Lee, Chun-Hao Paul Huang, Xuelin Chen +3 more

Recent foundational video-to-video diffusion models have achieved impressive results in editing user provided videos by modifying appearance, motion, or camera movement. However, real-world video editing is often an iterative process, where users refine results across multiple rounds of interaction....

video-to-video diffusion modelscross-consistencymulti-turn video editingmemory augmentationretrieval+5 more
Jan 22, 202618

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Shengbang Tong, Boyang Zheng, Ziteng Wang +7 more

Representation Autoencoders (RAEs) have shown distinct advantages in diffusion modeling on ImageNet by training in high-dimensional semantic latent spaces. In this work, we investigate whether this framework can scale to large-scale, freeform text-to-image (T2I) generation. We first scale RAE decode...

representation autoencodersdiffusion modelingsemantic latent spacestext-to-image generationfrozen representation encoder+8 more
Jan 22, 202650

A Mechanistic View on Video Generation as World Models: State and Dynamics

Luozhou Wang, Zhifei Chen, Yihua Du +11 more

Large-scale video generation models have demonstrated emergent physical coherence, positioning them as potential world models. However, a gap remains between contemporary "stateless" video architectures and classic state-centric world model theories. This work bridges this gap by proposing a novel t...

video generation modelsworld modelsstate constructiondynamics modelingimplicit paradigms+11 more
Jan 22, 20268

360Anything: Geometry-Free Lifting of Images and Videos to 360°

Ziyi Wu, Daniel Watson, Andrea Tagliasacchi +3 more

Lifting perspective images and videos to 360° panoramas enables immersive 3D world generation. Existing approaches often rely on explicit geometric alignment between the perspective and the equirectangular projection (ERP) space. Yet, this requires known camera metadata, obscuring the application to...

diffusion transformersperspective-to-equirectangular mappingtoken sequenceszero-paddingVAE encoder+5 more
Jan 22, 20267

HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models

Xin Xie, Jiaxian Guo, Dong Gong

Diffusion models achieve state-of-the-art performance but often fail to generate outputs that align with human preferences and intentions, resulting in images with poor aesthetic quality and semantic inconsistencies. Existing alignment methods present a difficult trade-off: fine-tuning approaches su...

diffusion modelshypernetworklow-rank adaptationdenoising trajectoryreward-conditioned alignment+4 more
Jan 22, 20265

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu, Mengtian Li +8 more

Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby ...

video customizationspatio-temporal video transfermulti-view informationtemporal cuestemporal alignment+4 more
Jan 20, 202634

FrankenMotion: Part-level Human Motion Generation and Composition

Chuqiao Li, Xianghui Xie, Yong Cao +2 more

Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to the absence of fine-grained, part-level motion annotations. This limits their controllability over individual ...

diffusion-basedpart-aware motion generationlarge language modelstemporally-aware part-level text annotationsatomic motion annotations+2 more
Jan 15, 202615

M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints

Yizhan Li, Florence Cloutier, Sifan Wu +5 more

Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. ...

large language modelsmulti-agent reasonerfragment-level editsretrieval-augmented generationGroup Relative Policy Optimization+8 more
Jan 15, 202616

RigMo: Unifying Rig and Motion Learning for Generative Animation

Hao Zhang, Jiahao Luo, Bohui Wan +7 more

Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent pro...

generative frameworkrig latentmotion latentSE(3) transformationsauto-rigging+5 more
Jan 10, 20267

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev, Irina Tolstykh +7 more

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside highly capable commercial systems. However, only a limited number of open-source approaches currently ac...

diffusion modelsQwen3-VLSana1.5instruction-based image editingimage generation+5 more
Jan 5, 202658
PreviousPage 4 of 4
Latest Generative AI Research | Generative AI Papers