Supplement Generation Training (SGT) trains a small LLM to produce task-specific supplemental text prepended to the input of a larger frozen LLM, improving downstream task performance without modifying the large model. This decouples task-specific adaptation from expensive full model retraining, making it practical to update only the lightweight supplement generator as base models evolve. The approach is framed as an alternative to repeated post-training of frontier models for agentic tasks.
Textual Parameter Graph Optimization (TPGO) models a multi-agent system as a graph of optimizable nodes (agents, tools, workflows) and derives structured natural-language "textual gradients" from execution traces to guide iterative optimization. Critically, the optimizer itself learns from accumulated optimization history, making the framework self-improving rather than static. This addresses the lack of structural awareness and adaptability in flat prompt-tuning approaches to MAS optimization.
Intuitor (ICLR 2026) trains LLMs to improve reasoning using only self-certainty as a reward signal—no labeled data, no external verifier, no human-crafted reward. The companion code release (RLIF framework) enables direct reproduction of the result that models can self-improve on reasoning benchmarks from internal feedback alone. Practically significant because it removes the dependency on curated verifiable datasets.
Architecture survey comparing 10 open-weight LLM releases from January–February 2026, with fact sheets and diagrams covering attention design, MoE structure, context length, and post-training approaches. Useful index for base model selection decisions going into Q1 2026 fine-tuning or deployment work.