🍡 feedmeAI
← All topics
Prompting 6 items

Everything Prompting

📑 arXiv Apr 22

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Investigates how prompt optimization and judge choice interact in LLM-as-a-Judge evaluations for legal QA on the LEXam benchmark, using ProTeGi optimization with Qwen3-32B and DeepSeek-V3 as judges. Lenient judge feedback yields larger and more consistent gains than strict feedback, and prompts optimized with lenient judges transfer better across judge models. Results highlight that judge disposition is a significant, underappreciated variable in automated evaluation pipelines.

📑 arXiv Apr 22

Supplement Generation Training for Enhancing Agentic Task Performance

Supplement Generation Training (SGT) trains a small LLM to produce task-specific supplemental text prepended to the input of a larger frozen LLM, improving downstream task performance without modifying the large model. This decouples task-specific adaptation from expensive full model retraining, making it practical to update only the lightweight supplement generator as base models evolve. The approach is framed as an alternative to repeated post-training of frontier models for agentic tasks.

📑 arXiv Apr 22

Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization

Textual Parameter Graph Optimization (TPGO) models a multi-agent system as a graph of optimizable nodes (agents, tools, workflows) and derives structured natural-language "textual gradients" from execution traces to guide iterative optimization. Critically, the optimizer itself learns from accumulated optimization history, making the framework self-improving rather than static. This addresses the lack of structural awareness and adaptability in flat prompt-tuning approaches to MAS optimization.

💬 Reddit Apr 20

Spent a weekend actually understanding and building Karpathy's "LLM Wiki" — here's what worked, what didn't

A hands-on build report on Karpathy's 'LLM Wiki' concept — pre-processing sources into a structured, interlinked markdown wiki rather than retrieving raw chunks at query time. Synthesis and cross-document reasoning questions improve noticeably versus RAG, but the approach struggles with scale, update latency, and source conflicts. Honest tradeoff analysis rather than a benchmark.

📝 Blog Jan 21

Get Good at Agents

Lambert documents a real multi-agent coding workflow — GPT-5 Pro for planning, Claude Code with Opus 4.5 for implementation, Codex with GPT-5.2 for high-thinking-effort tasks — and argues that directing parallel agents on open-ended tasks is replacing individual grind as the primary work mode. The thesis: scoping and directing agents is the durable skill edge, not raw effort.