Training 3 items

Everything Training

🧠 DeepMind Apr 22

Decoupled DiLoCo: A new frontier for resilient, distributed AI training

Google DeepMind proposes Decoupled DiLoCo, an extension of the DiLoCo distributed training framework designed for resilient training across heterogeneous or unreliable compute. No content snippet available beyond the title, but DiLoCo variants address the core challenge of large-scale training without tight synchronization.

Training Infrastructure Distributed-training

📑 arXiv Apr 22

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT)

MEM1 trains agents end-to-end via RL to compress and update an internal memory state at each step, maintaining constant context size across arbitrarily long multi-turn tasks. Unlike RAG or full-context retention, the memory management policy itself is learned. Demonstrated on multi-turn web and tool-use tasks; from MIT, accepted ICLR 2026.

Agents Training Reasoning Inference

🐙 GitHub Apr 22

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)

Intuitor (ICLR 2026) trains LLMs to improve reasoning using only self-certainty as a reward signal—no labeled data, no external verifier, no human-crafted reward. The companion code release (RLIF framework) enables direct reproduction of the result that models can self-improve on reasoning benchmarks from internal feedback alone. Practically significant because it removes the dependency on curated verifiable datasets.

Reasoning Training Fine-tuning

Decoupled DiLoCo: A new frontier for resilient, distributed AI training ↗

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT) ↗

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF) ↗

Decoupled DiLoCo: A new frontier for resilient, distributed AI training

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT)

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)