Llm — Topic — feedmeAI

📑 arXiv Apr 22

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

Investigates how prompt optimization and judge choice interact in LLM-as-a-Judge evaluations for legal QA on the LEXam benchmark, using ProTeGi optimization with Qwen3-32B and DeepSeek-V3 as judges. Lenient judge feedback yields larger and more consistent gains than strict feedback, and prompts optimized with lenient judges transfer better across judge models. Results highlight that judge disposition is a significant, underappreciated variable in automated evaluation pipelines.

Evaluation Prompting Llm Legal-ai

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization ↗

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization