🍡 feedmeAI
← All topics
Infrastructure 23 items

Everything Infrastructure

🟧 Hacker News 2d ago

Gemma 4 Release Triggers Debate About Tool Calling Implementation Issues

Gemma 4 release exposed systemic reliability issues where local model runners (Ollama, LM Studio) rushed launch-day support with broken tokenizer implementations and failed tool calls. Discussion highlighted trade-offs between inference tools, with performance benchmarks showing Ollama 25% faster than LM Studio on Mac, but recurring pattern of premature releases creating production issues.

📑 arXiv 3d ago
★ High Signal

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Scepsy is a serving system for multi-LLM agentic workflows that schedules arbitrary agent frameworks onto GPU clusters under oversubscription. It exploits the observation that while end-to-end workflow latencies are unpredictable, the relative execution time shares of each LLM remain stable across runs. Enables efficient serving of complex agentic workflows at target throughput with low latency.

📑 arXiv 3d ago

Autogenesis: A Self-Evolving Agent Protocol

Autogenesis Protocol (AGP) standardizes self-evolving agent systems by modeling prompts, agents, tools, environments, and memory as protocol-registered resources with lifecycle management and version tracking. The Resource Substrate Protocol Layer decouples what evolves from how evolution occurs, addressing brittleness in existing protocols like A2A and MCP.

🐙 GitHub 4d ago

guo2001china/35gateway: 35m.ai 旗下源码开放 AI Gateway,文本/图片/视频/音频/音乐一键接入,支持多供应商智能路由与自带 Key 混合使用,不浪费每一份算力。 Source-available AI gateway from 35m.ai for text, image, video, audio, and music. Supports smart multi-provider routing and bring-your-own-key workflows without wasting compute.

Source-available AI gateway from 35m.ai supporting unified access to text, image, video, audio, and music generation APIs with intelligent multi-provider routing and hybrid BYOK (bring-your-own-key) workflows. Optimizes compute utilization across heterogeneous provider backends.

📝 Blog 5d ago

AI Weekly: Agent-to-Agent Protocol Hits 1-Year Anniversary with 150+ Organizations

Google's Agent-to-Agent Protocol reached 150+ organizations and production deployments in Azure AI Foundry and Amazon Bedrock AgentCore at 1-year milestone. v1.0 added Signed Agent Cards for cryptographic identity verification between agents; combined with IBM's merged Agent Communication Protocol and AP2 commerce extension, it now covers full lifecycle from tool access to delegation to payments.

🐙 GitHub 5d ago

humanrouter/ddtree-mlx: Tree-based speculative decoding for Apple Silicon (MLX). ~10-15% faster than DFlash on code, ~1.5x over autoregressive. First MLX port with custom Metal kernels for hybrid model support.

ddtree-mlx ports tree-based speculative decoding to Apple Silicon with custom Metal kernels, achieving 10-15% speedup over DFlash on code and 1.5x over autoregressive inference. First MLX implementation supporting hybrid model architectures.

✍️ Simon Willison Jan 9

Simon Willison: 2026 is Year LLM Code Quality Becomes Impossible to Deny

Simon Willison predicts 2026 as inflection point where LLM code quality becomes undeniable, driven by reasoning models trained with RL specifically for code. Also forecasts 2026 as year of solving code sandboxing via containers and WebAssembly, addressing security risks and prompt injection vulnerabilities from executing untrusted LLM-generated code. Critical for safe agentic workflows.