🍡 feedmeAI
← All topics
Compression 4 items

Everything Compression

📑 arXiv 3d ago

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Analysis of all 154 Pythia-160m checkpoints reveals INT4 quantization robustness diverges catastrophically (11% to 517% gap) late in training while FP32 perplexity plateaus, contradicting the assumption that converged models are quantization-ready. Divergence begins when FP32 perplexity stagnates, not during learning rate decay, suggesting flat minima in full precision don't guarantee quantization stability.

🤗 Hugging Face 4d ago

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

Switch-KD proposes a visual-switch distillation framework unifying vision-language knowledge transfer by addressing modality-specific supervision inconsistencies in VLM knowledge distillation. Current KD methods supervise modalities separately without explicitly addressing multimodal alignment, leading to inconsistent knowledge transfer. The approach enables efficient VLM deployment in resource-constrained scenarios.