HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models Paper • 2601.15968 • Published 7 days ago • 4
AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation Paper • 2601.17761 • Published 4 days ago • 10
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published 6 days ago • 30
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory Paper • 2601.16296 • Published 7 days ago • 25
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 8 days ago • 72
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published 8 days ago • 18
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 7 days ago • 51
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 9 days ago • 23
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 9 days ago • 44
CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation Paper • 2601.11096 • Published 13 days ago • 8
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 14 days ago • 26
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 15 days ago • 32