📑 arXiv 2d ago
CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization
CiPO (Counterfactual Unlearning through iterative Preference Optimization) removes unwanted knowledge from Large Reasoning Models by intervening in chain-of-thought reasoning traces, avoiding degradation of reasoning performance. Redefines unlearning for LRMs as targeted CoT intervention rather than wholesale knowledge removal.