Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper ⢠2512.23044 ⢠Published 7 days ago ⢠9
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper ⢠2512.22905 ⢠Published 7 days ago ⢠17
P1: Mastering Physics Olympiads with Reinforcement Learning Paper ⢠2511.13612 ⢠Published Nov 17, 2025 ⢠134
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper ⢠2511.08521 ⢠Published Nov 11, 2025 ⢠37
Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper ⢠2406.10638 ⢠Published Jun 15, 2024
MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Paper ⢠2502.12558 ⢠Published Feb 18, 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval Paper ⢠2502.11431 ⢠Published Feb 17, 2025
Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification Paper ⢠2506.19225 ⢠Published Jun 24, 2025
TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos Paper ⢠2509.26360 ⢠Published Sep 30, 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper ⢠2511.08521 ⢠Published Nov 11, 2025 ⢠37
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper ⢠2511.08521 ⢠Published Nov 11, 2025 ⢠37
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding Paper ⢠2509.11866 ⢠Published Sep 15, 2025 ⢠1