Unlearning 2 items

Everything Unlearning

📑 arXiv 2d ago

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

CiPO (Counterfactual Unlearning through iterative Preference Optimization) removes unwanted knowledge from Large Reasoning Models by intervening in chain-of-thought reasoning traces, avoiding degradation of reasoning performance. Redefines unlearning for LRMs as targeted CoT intervention rather than wholesale knowledge removal.

Reasoning Unlearning Safety Fine-tuning

📑 arXiv 3d ago

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

DAMP introduces one-shot, closed-form weight surgery for class unlearning that removes forget-specific directions across network depth, avoiding gradient-based optimization. Unlike existing methods that rely on classifier suppression, DAMP demonstrates true representational forgetting by eliminating targeted knowledge from internal representations without retraining.

Safety Training Unlearning Mechanistic-interpretability

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization ↗

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions ↗

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions