1 7 6

duhe

Elynden

kinza99

AI & ML interests

None yet

Recent Activity

upvoted a paper 26 days ago

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

upvoted a paper about 1 month ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

liked a dataset 2 months ago

Elynden/AgentBench-EvoSyn

View all activity

Organizations

None yet

upvoted a paper 26 days ago

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published 27 days ago • 34

upvoted a paper about 1 month ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Paper • 2512.05111 • Published Dec 4, 2025 • 47

liked a dataset 2 months ago

Elynden/AgentBench-EvoSyn

Updated Oct 23, 2025 • 29 • 1

upvoted an article 3 months ago

Article

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

May 20, 2025

•

updated a dataset 3 months ago

Elynden/AgentBench-EvoSyn

Updated Oct 23, 2025 • 29 • 1

authored a paper 3 months ago

EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

Paper • 2510.17928 • Published Oct 20, 2025 • 2

commented a paper 3 months ago

EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning

Paper • 2510.17928 • Published Oct 20, 2025 • 2 •

updated a dataset 3 months ago

Elynden/LiveCodeBench-EvoSyn

Updated Oct 22, 2025 • 13

published 2 datasets 3 months ago

Elynden/AgentBench-EvoSyn

Updated Oct 23, 2025 • 29 • 1

Elynden/LiveCodeBench-EvoSyn

Updated Oct 22, 2025 • 13

updated a collection 3 months ago

EvoSyn

Collection

2 items • Updated Oct 20, 2025

upvoted a paper 3 months ago

Confidence as a Reward: Transforming LLMs into Reward Models

Paper • 2510.13501 • Published Oct 15, 2025 • 1

authored 4 papers 3 months ago

DevBench: A Comprehensive Benchmark for Software Development

Paper • 2403.08604 • Published Mar 13, 2024 • 2

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Paper • 2501.05040 • Published Jan 9, 2025 • 15

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

Paper • 2502.06563 • Published Feb 10, 2025

Confidence as a Reward: Transforming LLMs into Reward Models

Paper • 2510.13501 • Published Oct 15, 2025 • 1

liked a dataset 5 months ago

open-r1/ioi

Viewer • Updated Mar 12, 2025 • 270 • 81 • 10

upvoted a paper 7 months ago

SWE-bench Goes Live!

Paper • 2505.23419 • Published May 29, 2025 • 21

liked a Space 8 months ago

Open LMM Subjective Leaderboard

🌎

VLMEvalKit Subjectivce Benchmark Results

duhe

AI & ML interests

Recent Activity

Organizations

Elynden's activity

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

Open LMM Subjective Leaderboard