🍡 feedmeAI
← All topics
Fine-tuning 4 items

Everything Fine-tuning

📑 arXiv Apr 22

Supplement Generation Training for Enhancing Agentic Task Performance

Supplement Generation Training (SGT) trains a small LLM to produce task-specific supplemental text prepended to the input of a larger frozen LLM, improving downstream task performance without modifying the large model. This decouples task-specific adaptation from expensive full model retraining, making it practical to update only the lightweight supplement generator as base models evolve. The approach is framed as an alternative to repeated post-training of frontier models for agentic tasks.

📑 arXiv Apr 22

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Textual Parameter Graph Optimization (TPGO) models a multi-agent system as a graph of optimizable nodes (agents, tools, workflows) and derives structured natural-language "textual gradients" from execution traces to guide iterative optimization. Critically, the optimizer itself learns from accumulated optimization history, making the framework self-improving rather than static. This addresses the lack of structural awareness and adaptability in flat prompt-tuning approaches to MAS optimization.

🐙 GitHub Apr 22

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)

Intuitor (ICLR 2026) trains LLMs to improve reasoning using only self-certainty as a reward signal—no labeled data, no external verifier, no human-crafted reward. The companion code release (RLIF framework) enables direct reproduction of the result that models can self-improve on reasoning benchmarks from internal feedback alone. Practically significant because it removes the dependency on curated verifiable datasets.