MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes Paper ⢠2510.16380 ⢠Published Oct 18, 2025 ⢠2
view article Article Argunauts Update: Learning Formal Argument Analysis with RLVF and HIRPO Dec 2, 2025 ⢠1
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation Paper ⢠2511.15958 ⢠Published Nov 20, 2025 ⢠1