-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 116 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 85 • 159 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 106 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 18
Malkesh Dalia
malkesh2911
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 16 hours ago
My AI
upvoted
a
paper
15 days ago
MAXS: Meta-Adaptive Exploration with LLM Agents
upvoted
a
paper
about 1 month ago
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Organizations
None yet