Zero-shot World Model (ZWM) achieves state-of-the-art performance on visual-cognitive tasks using only a single child's visual experience data, requiring orders of magnitude less training data than current AI. BabyZWM demonstrates zero-shot transfer without task-specific training, offering a blueprint for human-scale data efficiency.
AdaSplash-2 accelerates differentiable sparse attention (α-entmax) via histogram-based initialization that reduces normalizer computation to 1-2 iterations. The method stores coarse attention score histograms in on-chip SRAM for accurate initialization, addressing the computational overhead that previously made sparse attention slower than softmax.
K-Token Merging compresses prompts in latent embedding space by merging K-token blocks via a lightweight encoder, then processing with LoRA-adapted LLMs. Operates at the embedding level rather than token space, reducing quadratic attention costs for long contexts.
LeanKG focuses on efficient formal theorem proving in Lean, emphasizing token efficiency for mathematical code generation. Title suggests optimization techniques for reducing computational cost while maintaining correctness. Targets formal verification and proof assistant workflows.