Submitted by JGC 35 PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Apsara Stack MaaS 2