🍡 feedmeAI
← All topics
Efficiency 4 items

Everything Efficiency

📑 arXiv 3d ago

AdaSplash-2: Faster Differentiable Sparse Attention

AdaSplash-2 accelerates differentiable sparse attention (α-entmax) via histogram-based initialization that reduces normalizer computation to 1-2 iterations. The method stores coarse attention score histograms in on-chip SRAM for accurate initialization, addressing the computational overhead that previously made sparse attention slower than softmax.