RISE (Readout Influence Sketching Estimator) achieves scalable data attribution for LLMs by focusing on influence hotspots at the output layer rather than computing gradients across the entire model. Uses CountSketch projections on dual-channel representation (lexical residual + semantic projected-error) to make gradient-based attribution tractable for large models.
WORC (Weak-link Optimization for Reasoning and Collaboration) improves multi-agent LLM frameworks by systematically identifying and reinforcing performance-limiting agents rather than only enhancing high-capability agents. Addresses reasoning instability where individual agent errors amplify through collaboration, grounded in the weak-link principle.
MemoSight unifies context compression with multi-token prediction to accelerate LLM reasoning without quality loss, addressing computational bottlenecks in long-context reasoning. The approach makes advanced reasoning capabilities more practical for production as context windows expand.
Prism is the first symbolic superoptimizer for tensor programs, using sGraph representation to symbolically encode operator families and execution parameters. Two-level search with symbolic pruning and e-graph verification achieves provably optimal kernels across large search spaces.
VisPCO formulates visual token pruning as a Pareto optimization problem to automatically find optimal computation-performance configurations for vision-language models. Uses continuous relaxation and gradient-based search via Augmented Lagrangian to approximate the empirical Pareto frontier across 8 visual benchmarks.
COEVO unifies functional correctness and PPA (power, performance, area) optimization for LLM-generated RTL code in a single co-evolutionary loop, replacing sequential pipelines that discard partially correct but architecturally promising candidates. Existing methods decouple correctness from PPA and reduce multi-objective optimization to scalar fitness, obscuring trade-offs. COEVO treats correctness as continuous rather than binary, enabling simultaneous optimization of both objectives.
MMOT introduces an Optimal Transport-based framework for online incremental learning that maintains evolving mixture model centroids instead of fixed or single adaptive centroids per class. The approach better handles multimodal data streams in continual learning scenarios where distributional shifts are severe and replay buffers have limited utility. Novel contribution is the dynamic centroid evolution mechanism grounded in OT theory.
Value Gradient Flow (VGF) frames behavior-regularized RL as an optimal transport problem mapping reference distributions to value-optimal policies, offering a scalable alternative to reparameterized policy gradients and reject sampling. The approach addresses value over-optimization in offline RL and LLM fine-tuning while scaling to large generative models.
KV Packet enables context-independent KV cache reuse without recomputation by wrapping cached documents in trainable soft-token adapters. Unlike CacheBlend or SAM-KV which still require selective recomputation, KV Packet treats caches as immutable packets and uses self-supervised distillation to bridge context discontinuities with zero FLOPs overhead.