WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published 11 days ago • 13
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning Paper • 2512.03244 • Published 23 days ago • 16
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Paper • 2506.14234 • Published Jun 17 • 41
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Paper • 2506.14234 • Published Jun 17 • 41 • 2
TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text Paper • 2505.11988 • Published May 17 • 3
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published Apr 15 • 35
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents Paper • 2504.13203 • Published Apr 15 • 35
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper • 2504.05506 • Published Apr 7 • 25
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations Paper • 2504.07830 • Published Apr 10 • 18
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging Paper • 2502.05664 • Published Feb 8 • 24
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published Dec 31, 2024 • 23
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published Dec 31, 2024 • 23
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval Paper • 2303.03004 • Published Mar 6, 2023
DelucionQA: Detecting Hallucinations in Domain-specific Question Answering Paper • 2312.05200 • Published Dec 8, 2023 • 2
Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning Paper • 2401.05787 • Published Jan 11, 2024 • 2
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning Paper • 2403.09028 • Published Mar 14, 2024