🍡 feedmeAI
← All topics
Llm 1 item

Everything Llm

📑 arXiv Apr 22

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Investigates how prompt optimization and judge choice interact in LLM-as-a-Judge evaluations for legal QA on the LEXam benchmark, using ProTeGi optimization with Qwen3-32B and DeepSeek-V3 as judges. Lenient judge feedback yields larger and more consistent gains than strict feedback, and prompts optimized with lenient judges transfer better across judge models. Results highlight that judge disposition is a significant, underappreciated variable in automated evaluation pipelines.