DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science Paper • 2602.24288 • Published 6 days ago • 2