IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published 7 days ago • 19
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 8 days ago • 61
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper • 2512.09247 • Published 15 days ago • 46
Generative Neural Video Compression via Video Diffusion Prior Paper • 2512.05016 • Published 20 days ago • 9
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published 28 days ago • 21
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 27 days ago • 210
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration Paper • 2509.22570 • Published Sep 26 • 3
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14 • 2
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios Paper • 2506.13977 • Published Jun 11 • 10
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11