Tooling 34 items

Everything Tooling

💬 Reddit 13h ago

llama.cpp speculative checkpointing was merged

llama.cpp merged speculative checkpointing support achieving 0-50% speedup on coding tasks with optimized parameters, though performance varies by prompt repetition patterns and draft acceptance rates. The feature uses n-gram matching for speculative decoding with configurable draft token ranges.

Inference Tooling Code Gen Performance

🟢 OpenAI 1d ago

OpenAI Codex Major Update

OpenAI Codex expanded beyond coding to include computer use, web workflows, image generation, memory, and automations. The updated developer app adds PR reviews, multi-file/terminal viewing, SSH devbox connections, and in-app browsing, serving 3+ million developers weekly.

Code Gen Agents Multimodal Tooling

🟧 Hacker News 2d ago

Gemma 4 Release Triggers Debate About Tool Calling Implementation Issues

Gemma 4 release exposed systemic reliability issues where local model runners (Ollama, LM Studio) rushed launch-day support with broken tokenizer implementations and failed tool calls. Discussion highlighted trade-offs between inference tools, with performance benchmarks showing Ollama 25% faster than LM Studio on Mac, but recurring pattern of premature releases creating production issues.

Tooling Deployment Tool-calling Infrastructure

📑 arXiv 2d ago

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

ChemGraph-XANES automates X-ray absorption near-edge structure simulation workflows using a LangGraph/LangChain-based agentic framework that handles natural-language task specification, structure acquisition, FDMNES execution, and provenance-aware data curation. Built on ASE, FDMNES, and Parsl, it addresses workflow complexity constraints that limit computational XANES deployment at scale.

Agents Science Tooling

🟧 Hacker News 2d ago

Claude Design

Anthropic launches Claude Design, a new product offering from the Claude AI family. Details on capabilities and target use cases not provided in source.

Models Tooling

📝 Blog 3d ago

Speculative Decoding Shines for Agentic Use Cases

Speculative decoding uses a smaller draft model to generate candidate tokens that a larger target model validates in a single pass, providing significant speedup for agentic workloads heavy on tool calls and structured outputs without quality loss. Cloudflare reports this is particularly effective for coding agents and API integration tasks where tool calling volume is high.

Agents Inference Tooling

✍️ Simon Willison 3d ago

llm-anthropic 0.25

Release of llm-anthropic 0.25, an update to the Python library for interacting with Anthropic's API. Provides improved tooling for Claude model integration. Incremental improvements to existing developer tooling.

Tooling Api-integration

🐙 GitHub 3d ago

cablate/llm-atomic-wiki: An extension of Karpathy's LLM Wiki pattern: atom layer, topic-branches, two-layer Lint. Distilled from running the pattern end-to-end.

Extension of Karpathy's LLM Wiki pattern adding atomic layer abstraction, topic-branch organization, and two-layer linting for knowledge management workflows. Distills lessons from end-to-end implementation of the documentation pattern. Open-source tooling for LLM-assisted knowledge base maintenance.

Tooling Knowledge-management Documentation

🟧 Hacker News 3d ago

Android CLI: Build Android apps 3x faster using any agent

Command-line tool claims to accelerate Android app development 3x when used with AI coding agents. Streamlines agent-based mobile development workflows.

Agents Code Gen Tooling

🐙 GitHub 3d ago

yzhao062/anywhere-agents: One config to rule all your AI agents: portable (every project, every session), effective (curated writing, routing, skills), and safer (destructive-command guard).

Anywhere-agents is a configuration management tool for AI agents emphasizing portability across projects, curated writing/routing/skills capabilities, and safety via destructive-command guards. Single config approach unifies agent behavior management. Addresses agent configuration consistency and safety concerns.

Agents Tooling Safety

📑 arXiv 3d ago

Agent-Aided Design for Dynamic CAD Models

Agent-Aided Design systems use LLMs in a feedback loop to write CAD code, compile models, visualize results, and iteratively refine designs, but cannot yet generate complex 3D assemblies with moving parts like pistons or scissors. This work identifies the capability gap preventing these training-free agentic systems from impacting industrial manufacturing. Addresses the transition from static CAD objects to dynamic mechanical assemblies.

Agents Code Gen Tooling

📑 arXiv 3d ago

CoGrid & the Multi-User Gymnasium: A Framework for Multi-Agent Experimentation

CoGrid is a multi-agent grid simulation library with NumPy and JAX backends, paired with Multi-User Gymnasium (MUG) that converts simulations into interactive web experiments. The tools lower barriers for researchers studying human-AI interaction by supporting arbitrary numbers of humans and AI agents in both server-authoritative and peer-to-peer modes.

Agents Infrastructure Tooling

🐙 GitHub 3d ago

TheArcForge/UniClaude: Claude Code, natively inside Unity Editor. A dockable chat window with full project awareness, 60+ MCP tools, and zero alt-tabbing.

UniClaude integrates Claude directly into Unity Editor as a dockable window with full project context awareness and 60+ MCP tools. Eliminates context switching during game development by embedding the AI assistant natively in the IDE. Provides workflow-specific tooling for game developers working in Unity.

Tooling Code Gen Agents

🟢 OpenAI 3d ago

Codex for (almost) everything

OpenAI's Codex app for macOS and Windows now includes computer use capabilities, in-app browsing, image generation, memory, and plugins. The update transforms Codex from a code-focused assistant into a multi-capability developer productivity platform.

Code Gen Multimodal Tooling Agents

🐙 GitHub 4d ago

Hugging Face Transformers: Mistral 4 and Multimodal Model Support

Hugging Face transformers adds support for Mistral 4 (119B MoE with 128 experts unifying Instruct, Reasoning, and Devstral), Jina Embeddings v3, and multiple OCR/video models including VidEoMT, UVDoc, and PI0 robotics VLA. Includes quantization, tokenization, and caching speedups with breaking changes.

Models Multimodal Tooling Inference

🤗 Hugging Face 4d ago

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

RadAgent is a tool-using AI agent for chest CT interpretation that generates reports through a stepwise, interpretable process with fully inspectable traces of intermediate decisions and tool interactions. Improves on CT-Chat VLM baseline across three dimensions while allowing clinicians to examine how findings are derived rather than being passive observers.

Agents Multimodal Medical-imaging Tooling

🐙 GitHub 4d ago

vunone/ennoia: Declarative Document Indexing (DDI) framework for Python. Define schemas, extract structured indices, search smarter.

Ennoia provides declarative document indexing framework for Python allowing schema-driven structured extraction and search. Enables developers to define index schemas and extract queryable structures from documents programmatically.

RAG Tooling Structured-extraction Indexing

🐙 GitHub 4d ago

mikepapadim/london-property-hunt-public: Automated London flat/room hunt powered by Claude Code + Claude in Chrome + Gmail MCP. Scrapes 4 rental platforms on a cron, deduplicates via spreadsheet, prioritises HIGH/MED/LOW, and emails ready-to-send outreach.

Automated London rental property hunting system combining Claude Code, Claude in Chrome, and Gmail MCP. Scrapes four rental platforms on cron, deduplicates via spreadsheet, prioritizes listings as HIGH/MED/LOW, and generates ready-to-send outreach emails. Demonstrates practical agent orchestration for real-world automation tasks.

Agents Automation Tooling

🤗 HF Blog 4d ago

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

Hugging Face analysis of VAKRA agent system covering reasoning patterns, tool use mechanisms, and common failure modes in agent architectures.

Agents Reasoning Tooling

🐙 GitHub 4d ago

he-yufeng/RepoWiki: Open-source DeepWiki alternative — generate comprehensive wiki documentation for any codebase from terminal or browser

RepoWiki is an open-source alternative to DeepWiki that generates comprehensive wiki documentation for codebases from terminal or browser. The tool automates technical documentation creation for software repositories.

Code Gen Tooling Open Weights

🐙 GitHub 4d ago

guo2001china/35gateway: 35m.ai 旗下源码开放 AI Gateway，文本/图片/视频/音频/音乐一键接入，支持多供应商智能路由与自带 Key 混合使用，不浪费每一份算力。 Source-available AI gateway from 35m.ai for text, image, video, audio, and music. Supports smart multi-provider routing and bring-your-own-key workflows without wasting compute.

Source-available AI gateway from 35m.ai supporting unified access to text, image, video, audio, and music generation APIs with intelligent multi-provider routing and hybrid BYOK (bring-your-own-key) workflows. Optimizes compute utilization across heterogeneous provider backends.

Infrastructure Multimodal Tooling Routing

🔶 Anthropic 5d ago

Anthropic Claude Code Desktop App Redesign

Anthropic redesigned Claude Code desktop app with parallel session management sidebar, integrated terminal, in-app file editor, and Routines—automation running on schedules, API calls, or GitHub events without active sessions. Available for Pro, Max, Team, and Enterprise users on macOS and Windows.

Code Gen Agents Tooling

🟢 OpenAI 5d ago

★ High Signal

OpenAI Agents SDK Evolution with Native Sandbox Execution

OpenAI's Agents SDK update adds native sandbox execution and model-native harness for building production-grade agents with improved safety and execution isolation. Represents a shift from experimental prototypes to production-ready agentic workflows with support for long-running agents working across files and tools.

Agents Safety Tooling

🟢 OpenAI 5d ago

★ High Signal

OpenAI Codex Major Update - Expanded Computer Use

OpenAI Codex expands from coding to full computer use with web workflows, multi-step planning, autonomous actions, and audio-visual processing for 3M+ weekly developers. Now handles PR reviews, multiple file/terminal views, SSH connections, and in-app browsing. Shift from code generation tool to general-purpose computer control agent.

Code Gen Agents Multimodal Tooling

📝 Blog 5d ago

Latent Space: Notion Custom Agents - Building Production AI

Notion rebuilt Custom Agents 4-5 times before production launch due to early failures from lack of tool-calling standards, short context, and unreliable models. "Agent Lab" thesis: time roadmap carefully to avoid swimming upstream against model limitations while building early enough. Practical lessons on when to ship agent features based on foundation model maturity.

Agents Deployment Tooling

📝 Blog 5d ago

Latent Space: Notion's Journey Building Custom AI Agents

Notion rebuilt Custom Agents 4-5 times before production, revealing early agent attempts failed due to lack of tool-calling standards and short context windows. Their 'Agent Lab' thesis focuses on building product systems around frontier capabilities, with coding agents viewed as the kernel of future 'software factories' comprising spec/code/test/review agents.

Agents Code Gen Tooling Deployment

💬 Reddit 5d ago

The LLM tunes its own llama.cpp flags (+54% tok/s on Qwen3.5-27B)

An LLM-based auto-tuning system for llama.cpp that optimizes inference flags by reading --help output and iteratively testing configurations. Achieves 54% speedup on Qwen3.5-27B (40 tok/s vs 26 tok/s) and automatically adapts to new llama.cpp releases by ingesting updated help text.

Inference Tooling Open Weights

🤗 Hugging Face 6d ago

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

Analysis of Claude Code's TypeScript source code and comparison with OpenClaw identifies five core human values (decision authority, safety, reliable execution, capability amplification, contextual adaptability) traced through thirteen design principles to implementation choices. The core architecture is a simple while-loop calling the model, running tools, and returning results—demonstrating how design philosophy shapes agentic system architecture.

Agents Code Gen Tooling

🐙 GitHub 6d ago

MaxKmet/idea-validation-agents: AI agents that act as your personal venture analyst - from startup idea brainstorming to full validation and go-to-market strategy. Built for developers who'd rather validate in 10 minutes than regret in six months. Powered by Claude Code, OpenAI Codex, and Cursor.

Open-source AI agent system that automates startup idea validation from brainstorming through go-to-market strategy, powered by Claude, OpenAI, and Cursor. Targets developers seeking rapid validation in 10 minutes instead of months-long manual processes.

Agents Tooling Automation

🐙 GitHub 6d ago

helloianneo/awesome-claude-code-skills: Claude Code 最实用的 Skills / Agents / Plugins 精选合集 | 50+ 精选 | 按场景分类 | 带推荐等级 | 复制即装

Curated collection of 50+ Claude Code skills, agents, and plugins organized by use case with recommendation ratings. Ready-to-use extensions for Claude-based development workflows.

Agents Code Gen Tooling

💬 Reddit 6d ago

OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests.

Analysis of 1000+ OpenClaw deployments reveals minimal legitimate use cases beyond daily news digests, despite 250K GitHub stars and significant engineering investment. Users who spent weeks attempting production deployment found the tool connects to messaging apps and LLMs but lacks practical applications.

Agents Deployment Tooling

💬 Reddit 6d ago

Gemma 4 - lazy model or am I crazy? (bit of a rant)

Gemma 4 26B MoE shows reluctance to use tools or web search, defaulting to internal knowledge and performing minimal searches when explicitly requested. Community feedback on model's agentic capabilities despite strong benchmarks. Highlights gap between stated capabilities and practical tool use.

Agents Tooling Models

📝 Blog 1w ago

KDnuggets: 5 Best Books for Building Agentic AI Systems in 2026

KDnuggets recommends five books for building agentic AI systems, headlined by Chip Huyen's "AI Engineering" for its practical focus on production tradeoffs like latency vs. accuracy and cost vs. capability. The list targets practitioners shipping multi-agent orchestration, tool-calling, and memory management to production in 2026.

Agents Tooling Engineering

✍️ Simon Willison 1w ago

Simon Willison: Exploring the Servo Crate with Claude Code

Simon Willison uses Claude Code to explore Servo v0.1.0 Rust crate, building CLI screenshot tool and investigating WebAssembly compilation autonomously. Demonstrates "agentic engineering" workflow where developer tasks AI with discovering library capabilities and building working tools. Evolution from code completion to exploratory development assistance.

Code Gen Agents Tooling