Generative-models — Topic

📑 arXiv 3d ago

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching

MoE-FM uses mixture-of-experts to capture complex latent geometries (anisotropy, multimodality) in flow matching for language models. YAN non-autoregressive LM built on MoE-FM matches diffusion quality with faster inference in both Transformer and Mamba architectures.

Models Inference Generative-models

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching ↗

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching