Efficiency 4 items

Everything Efficiency

💬 Reddit 2d ago

Zero-shot World Models Are Developmentally Efficient Learners [R]

Zero-shot World Model (ZWM) achieves state-of-the-art performance on visual-cognitive tasks using only a single child's visual experience data, requiring orders of magnitude less training data than current AI. BabyZWM demonstrates zero-shot transfer without task-specific training, offering a blueprint for human-scale data efficiency.

Models Training Datasets Efficiency

📑 arXiv 3d ago

AdaSplash-2: Faster Differentiable Sparse Attention

AdaSplash-2 accelerates differentiable sparse attention (α-entmax) via histogram-based initialization that reduces normalizer computation to 1-2 iterations. The method stores coarse attention score histograms in on-chip SRAM for accurate initialization, addressing the computational overhead that previously made sparse attention slower than softmax.

Training Inference Efficiency

📑 arXiv 3d ago

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

K-Token Merging compresses prompts in latent embedding space by merging K-token blocks via a lightweight encoder, then processing with LoRA-adapted LLMs. Operates at the embedding level rather than token space, reducing quadratic attention costs for long contexts.

Inference Fine-tuning Efficiency

🐙 GitHub 6d ago

FreePeak/LeanKG: LeanKG: Stop Burning Tokens. Start Coding Lean.

LeanKG focuses on efficient formal theorem proving in Lean, emphasizing token efficiency for mathematical code generation. Title suggests optimization techniques for reducing computational cost while maintaining correctness. Targets formal verification and proof assistant workflows.

Code Gen Formal-verification Efficiency

Zero-shot World Models Are Developmentally Efficient Learners [R] ↗

AdaSplash-2: Faster Differentiable Sparse Attention ↗

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models ↗

FreePeak/LeanKG: LeanKG: Stop Burning Tokens. Start Coding Lean. ↗

Zero-shot World Models Are Developmentally Efficient Learners [R]

AdaSplash-2: Faster Differentiable Sparse Attention

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

FreePeak/LeanKG: LeanKG: Stop Burning Tokens. Start Coding Lean.