Distillation 3 items

Everything Distillation

💬 Reddit 5d ago

How to Distill from 100B+ to <4B Models

Guide on distilling knowledge from 100B+ parameter models into sub-4B models. Addresses practical methods for compressing frontier model capabilities into efficient local deployments.

Distillation Training Compression

💬 Reddit 6d ago

How to Distill from 100B+ to <4B Models

Active community discussion (129 posts) on knowledge distillation techniques for compressing 100B+ parameter models into sub-4B variants suitable for consumer hardware deployment. Represents shift from passive model consumption to creating custom distilled models optimized for edge devices, phones, and lightweight laptops. Enables preserving large model capabilities while meeting resource constraints.

Distillation Inference Deployment Open Weights

🤗 Hugging Face 1w ago

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

Byte-Level Distillation (BLD) solves cross-tokenizer distillation by converting teacher output distributions to byte-level probabilities and adding a lightweight byte decoder to the student. This simple approach outperforms complex vocabulary alignment heuristics by operating at the common byte interface shared across all tokenizers.

Training Distillation Models

How to Distill from 100B+ to &lt;4B Models ↗

How to Distill from 100B+ to <4B Models ↗

Cross-Tokenizer LLM Distillation through a Byte-Level Interface ↗

How to Distill from 100B+ to <4B Models

How to Distill from 100B+ to <4B Models

Cross-Tokenizer LLM Distillation through a Byte-Level Interface