Continual-learning — Topic

🤗 Hugging Face Apr 22

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

SkillLearnBench is the first benchmark for continual skill learning in LLM agents, covering 20 verified tasks across 15 sub-domains with evaluation at three levels: skill quality, execution trajectory, and task outcome. Tested methods include one-shot learning, self/teacher feedback, and skill-creator approaches; all improve over the no-skill baseline but none achieves consistent gains across domains. Highlights that automatic skill acquisition for agents remains an unsolved problem despite recent progress.

Agents Benchmarks Evaluation Continual-learning

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks ↗

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks