Fine-tuning 4 items

Everything Fine-tuning

📑 arXiv Apr 22

Supplement Generation Training for Enhancing Agentic Task Performance

Supplement Generation Training (SGT) trains a small LLM to produce task-specific supplemental text prepended to the input of a larger frozen LLM, improving downstream task performance without modifying the large model. This decouples task-specific adaptation from expensive full model retraining, making it practical to update only the lightweight supplement generator as base models evolve. The approach is framed as an alternative to repeated post-training of frontier models for agentic tasks.

Agents Fine-tuning Inference Prompting

📑 arXiv Apr 22

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Textual Parameter Graph Optimization (TPGO) models a multi-agent system as a graph of optimizable nodes (agents, tools, workflows) and derives structured natural-language "textual gradients" from execution traces to guide iterative optimization. Critically, the optimizer itself learns from accumulated optimization history, making the framework self-improving rather than static. This addresses the lack of structural awareness and adaptability in flat prompt-tuning approaches to MAS optimization.

Agents Prompting Tooling Fine-tuning

🐙 GitHub Apr 22

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)

Intuitor (ICLR 2026) trains LLMs to improve reasoning using only self-certainty as a reward signal—no labeled data, no external verifier, no human-crafted reward. The companion code release (RLIF framework) enables direct reproduction of the result that models can self-improve on reasoning benchmarks from internal feedback alone. Practically significant because it removes the dependency on curated verifiable datasets.

Reasoning Training Fine-tuning

📝 Blog Feb 25

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

Architecture survey comparing 10 open-weight LLM releases from January–February 2026, with fact sheets and diagrams covering attention design, MoE structure, context length, and post-training approaches. Useful index for base model selection decisions going into Q1 2026 fine-tuning or deployment work.

Open Weights Models Benchmarks Fine-tuning

Supplement Generation Training for Enhancing Agentic Task Performance ↗

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization ↗

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF) ↗

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026 ↗

Supplement Generation Training for Enhancing Agentic Task Performance

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Learning to Reason Without External Rewards via Reinforcement Learning from Internal Feedback (RLIF)

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026