Latest Generative AI Research Papers

Research on AI systems that create new content including image generation, text-to-image, video synthesis, and creative AI applications.

14 Papers
Showing 14 of 14 papers

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Shengbang Tong, Boyang Zheng, Ziteng Wang +7 more

Representation Autoencoders (RAEs) have shown distinct advantages in diffusion modeling on ImageNet by training in high-dimensional semantic latent spaces. In this work, we investigate whether this framework can scale to large-scale, freeform text-to-image (T2I) generation. We first scale RAE decode...

representation autoencodersdiffusion modelingsemantic latent spacestext-to-image generationfrozen representation encoder+8 more
Jan 22, 202646

360Anything: Geometry-Free Lifting of Images and Videos to 360°

Ziyi Wu, Daniel Watson, Andrea Tagliasacchi +3 more

Lifting perspective images and videos to 360° panoramas enables immersive 3D world generation. Existing approaches often rely on explicit geometric alignment between the perspective and the equirectangular projection (ERP) space. Yet, this requires known camera metadata, obscuring the application to...

diffusion transformersperspective-to-equirectangular mappingtoken sequenceszero-paddingVAE encoder+5 more
Jan 22, 20267

ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

Remy Sabathier, David Novotny, Niloy J. Mitra +1 more

Generating animated 3D objects is at the heart of many applications, yet most advanced works are typically difficult to apply in practice because of their limited setup, their long runtime, or their limited quality. We introduce ActionMesh, a generative model that predicts production-ready 3D meshes...

3D diffusion modelstemporal axislatent sequencestemporal 3D autoencoderreference shape+3 more
Jan 22, 202611

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu, Mengtian Li +8 more

Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby ...

video customizationspatio-temporal video transfermulti-view informationtemporal cuestemporal alignment+4 more
Jan 20, 202634

M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints

Yizhan Li, Florence Cloutier, Sifan Wu +5 more

Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. ...

large language modelsmulti-agent reasonerfragment-level editsretrieval-augmented generationGroup Relative Policy Optimization+8 more
Jan 15, 202616

FrankenMotion: Part-level Human Motion Generation and Composition

Chuqiao Li, Xianghui Xie, Yong Cao +2 more

Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to the absence of fine-grained, part-level motion annotations. This limits their controllability over individual ...

diffusion-basedpart-aware motion generationlarge language modelstemporally-aware part-level text annotationsatomic motion annotations+2 more
Jan 15, 202615

RigMo: Unifying Rig and Motion Learning for Generative Animation

Hao Zhang, Jiahao Luo, Bohui Wan +7 more

Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent pro...

generative frameworkrig latentmotion latentSE(3) transformationsauto-rigging+5 more
Jan 10, 20267

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev, Irina Tolstykh +7 more

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside highly capable commercial systems. However, only a limited number of open-source approaches currently ac...

diffusion modelsQwen3-VLSana1.5instruction-based image editingimage generation+5 more
Jan 5, 202658
Latest Generative AI Research | Generative AI Papers