Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published 15 days ago • 17
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published 12 days ago • 17
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 19 days ago • 51
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 20 days ago • 71
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30, 2025 • 110
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 166
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again Paper • 2507.22058 • Published Jul 29, 2025 • 40
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9, 2025 • 46
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14, 2025 • 90
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11, 2025 • 62
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23, 2025 • 32
Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning Paper • 2506.02327 • Published Jun 2, 2025 • 20
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published Jun 11, 2025 • 101