MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents (ICLR 2026, MIT)
MEM1 trains agents end-to-end via RL to compress and update an internal memory state at each step, maintaining constant context size across arbitrarily long multi-turn tasks. Unlike RAG or full-context retention, the memory management policy itself is learned. Demonstrated on multi-turn web and tool-use tasks; from MIT, accepted ICLR 2026.