Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors Paper • 2509.00969 • Published Aug 31, 2025 • 2
ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation Paper • 2509.12618 • Published Sep 16, 2025 • 1
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27, 2025 • 21
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published Nov 18, 2025 • 17
Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors Paper • 2509.00969 • Published Aug 31, 2025 • 2
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 29
TIIF-Bench: How Does Your T2I Model Follow Your Instructions? Paper • 2506.02161 • Published Jun 2, 2025 • 13