Reasoning — Topic

📑 arXiv Apr 22

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT)

MEM1 trains agents end-to-end via RL to compress and update an internal memory state at each step, maintaining constant context size across arbitrarily long multi-turn tasks. Unlike RAG or full-context retention, the memory management policy itself is learned. Demonstrated on multi-turn web and tool-use tasks; from MIT, accepted ICLR 2026.

Agents Training Reasoning Inference

🐙 GitHub Apr 22

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)

Intuitor (ICLR 2026) trains LLMs to improve reasoning using only self-certainty as a reward signal—no labeled data, no external verifier, no human-crafted reward. The companion code release (RLIF framework) enables direct reproduction of the result that models can self-improve on reasoning benchmarks from internal feedback alone. Practically significant because it removes the dependency on curated verifiable datasets.

Reasoning Training Fine-tuning

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT) ↗

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF) ↗

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT)

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)