π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29, 2025 • 64
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23, 2025 • 29
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels Paper • 2509.16596 • Published Sep 20, 2025 • 14
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1, 2025 • 51
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11, 2025 • 80
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Paper • 2509.06951 • Published Sep 8, 2025 • 32
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 83
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning Paper • 2509.06461 • Published Sep 8, 2025 • 19
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 105
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28
π_0: A Vision-Language-Action Flow Model for General Robot Control Paper • 2410.24164 • Published Oct 31, 2024 • 30
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published Jan 16, 2025 • 27
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4, 2025 • 186