SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks Paper • 2507.11059 • Published Jul 15, 2025 • 6
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks Paper • 2507.12284 • Published Jul 16, 2025 • 7