Optimization 9 items

Everything Optimization

📑 arXiv 2d ago

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

RISE (Readout Influence Sketching Estimator) achieves scalable data attribution for LLMs by focusing on influence hotspots at the output layer rather than computing gradients across the entire model. Uses CountSketch projections on dual-channel representation (lexical residual + semantic projected-error) to make gradient-based attribution tractable for large models.

Training Data-attribution Interpretability Optimization

📑 arXiv 2d ago

Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

WORC (Weak-link Optimization for Reasoning and Collaboration) improves multi-agent LLM frameworks by systematically identifying and reinforcing performance-limiting agents rather than only enhancing high-capability agents. Addresses reasoning instability where individual agent errors amplify through collaboration, grounded in the weak-link principle.

Agents Reasoning Multi-agent Optimization

📑 arXiv 3d ago

MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration

MemoSight unifies context compression with multi-token prediction to accelerate LLM reasoning without quality loss, addressing computational bottlenecks in long-context reasoning. The approach makes advanced reasoning capabilities more practical for production as context windows expand.

Inference Reasoning Context-compression Optimization

📑 arXiv 3d ago

Prism: Symbolic Superoptimization of Tensor Programs

Prism is the first symbolic superoptimizer for tensor programs, using sGraph representation to symbolically encode operator families and execution parameters. Two-level search with symbolic pruning and e-graph verification achieves provably optimal kernels across large search spaces.

Training Infrastructure Optimization

📑 arXiv 3d ago

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models

VisPCO formulates visual token pruning as a Pareto optimization problem to automatically find optimal computation-performance configurations for vision-language models. Uses continuous relaxation and gradient-based search via Augmented Lagrangian to approximate the empirical Pareto frontier across 8 visual benchmarks.

Multimodal Inference Optimization

📑 arXiv 3d ago

COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation

COEVO unifies functional correctness and PPA (power, performance, area) optimization for LLM-generated RTL code in a single co-evolutionary loop, replacing sequential pipelines that discard partially correct but architecturally promising candidates. Existing methods decouple correctness from PPA and reduce multi-objective optimization to scalar fitness, obscuring trade-offs. COEVO treats correctness as continuous rather than binary, enabling simultaneous optimization of both objectives.

Code Gen Optimization Hardware-design

🤗 Hugging Face 4d ago

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

MMOT introduces an Optimal Transport-based framework for online incremental learning that maintains evolving mixture model centroids instead of fixed or single adaptive centroids per class. The approach better handles multimodal data streams in continual learning scenarios where distributional shifts are severe and replay buffers have limited utility. Novel contribution is the dynamic centroid evolution mechanism grounded in OT theory.

Training Continual-learning Optimization

🤗 Hugging Face 5d ago

Reinforcement Learning via Value Gradient Flow

Value Gradient Flow (VGF) frames behavior-regularized RL as an optimal transport problem mapping reference distributions to value-optimal policies, offering a scalable alternative to reparameterized policy gradients and reject sampling. The approach addresses value over-optimization in offline RL and LLM fine-tuning while scaling to large generative models.

Training Fine-tuning Reinforcement-learning Optimization

🤗 Hugging Face 6d ago

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

KV Packet enables context-independent KV cache reuse without recomputation by wrapping cached documents in trainable soft-token adapters. Unlike CacheBlend or SAM-KV which still require selective recomputation, KV Packet treats caches as immutable packets and uses self-supervised distillation to bridge context discontinuities with zero FLOPs overhead.

Inference Caching Optimization