MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments Paper • 2510.01353 • Published Oct 1, 2025 • 2 • 2
TRAIL: Trace Reasoning and Agentic Issue Localization Paper • 2505.08638 • Published May 13, 2025 • 6 • 2
GLIDER: Grading LLM Interactions and Decisions using Explainable Ranking Paper • 2412.14140 • Published Dec 18, 2024 • 1 • 3