vlnr is an autonomous security agent for the Python supply chain: it scans packages for vulnerabilities, generates proof-of-concept exploits, and validates them inside isolated Docker containers. Full-loop autonomous exploit generation and validation is the novel aspect.
Zed editor adds support for running multiple AI agents in parallel within the same workspace, allowing concurrent agentic tasks on different parts of a codebase. No content snippet is available, but the feature extends Zed's existing AI coding capabilities to multi-agent workflows. Relevant for teams evaluating editor-native agent orchestration versus external tooling.
Pairing Qwen3.6-35B with the 'little-coder' agent scaffold achieves 78.7% on the Polyglot coding benchmark, landing in the public top 10 and competitive with leading cloud models. The same scaffold previously lifted a 9B Qwen model from 19.11% to 45.56%, suggesting a significant portion of the local-vs-cloud performance gap is attributable to scaffold/harness mismatch rather than model capability alone.
OpenAI introduces workspace agents in ChatGPT: Codex-powered cloud agents that can automate multi-step workflows across tools on behalf of teams. They run asynchronously in the cloud, scoped to a workspace with access controls. This extends Codex beyond single-shot code generation into persistent, team-level agentic task execution.
Open-source test harness for text-to-CAD generation, providing scaffolding to prompt LLMs and evaluate their CAD model outputs. Targets the emerging niche of AI-driven parametric and 3D design automation.
A Reddit thread observes that the practical capability gap between technical and non-technical AI users has widened sharply: non-technical users largely treat LLMs as search, while technical users leverage agents, computer use, Claude Code, and model selection. The post notes that nearly all recent model improvements are coding-focused, leaving general users with little perceived change. Reflects a real bifurcation in who captures value from frontier model advances.
UniClaude embeds Claude Code directly into the Unity Editor as a dockable chat window, giving it full project awareness and access to 60+ MCP tools without leaving the editor. Targets the context-switching friction that plagues game dev AI workflows. Essentially a Unity-native MCP client wired to Claude.
Raschka breaks down the practical anatomy of a coding agent into three components: tool use (file I/O, shell, search), memory (in-context vs. external), and repository-level context management. Written as a grounding companion to his LLM architecture series, it maps abstract agent design concepts onto how systems like Claude Code and Codex actually operate.
A 668-point HN post documents that swapping the evaluation harness — without changing any model — improved measured coding performance across 15 LLMs in an afternoon. Directly implicates harness sensitivity as a major confounder in coding benchmark results. High-signal for anyone designing or interpreting code evals.
Lambert documents a real multi-agent coding workflow — GPT-5 Pro for planning, Claude Code with Opus 4.5 for implementation, Codex with GPT-5.2 for high-thinking-effort tasks — and argues that directing parallel agents on open-ended tasks is replacing individual grind as the primary work mode. The thesis: scoping and directing agents is the durable skill edge, not raw effort.