📑 arXiv 3d ago
Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching
MoE-FM uses mixture-of-experts to capture complex latent geometries (anisotropy, multimodality) in flow matching for language models. YAN non-autoregressive LM built on MoE-FM matches diffusion quality with faster inference in both Transformer and Mamba architectures.