Qwen3.6-35B-A3B running at 8-bit quantization with 64k context matches Claude quality for code tasks on consumer hardware (M5 Max, 128GB). Handles complex multi-step research tasks with many tool calls and maintains performance on long context coding tasks. Enables fully local development workflows without sending code to external providers.
OptiMer demonstrates that merging distribution vectors during continual pre-training outperforms traditional data mixing when adapting foundation models. The approach enables more efficient domain adaptation without full retraining, challenging conventional strategies for combining diverse data distributions in continual learning.
A locally-running world model trained for iPad interprets arbitrary photos and drawings into controllable driving gameplay. The experimental game demonstrates on-device world model inference for interactive applications, though current output quality remains imperfect.
Reddit post announces upcoming release of Kimi K2.6 model with no additional details provided.
Qwen3.6-35B-A3B successfully solved coding problems that Qwen3.5-27B couldn't handle, reducing technical debt in a complex budgeting app project. Users report improved code quality and architectural decisions on multi-feature applications.
Git repository tracking evolution of Claude system prompts over time. Enables analysis of how Anthropic adjusts model behavior and guardrails through prompt engineering.
Zero-shot World Model (ZWM) achieves state-of-the-art performance on visual-cognitive tasks using only a single child's visual experience data, requiring orders of magnitude less training data than current AI. BabyZWM demonstrates zero-shot transfer without task-specific training, offering a blueprint for human-scale data efficiency.
Claude Opus 4.7's new tokenizer inflates token counts 35-45% for identical inputs (especially code-heavy prompts), causing silent production cost increases despite unchanged "$5/$25 per million tokens" pricing—a $500/day app became $675/day overnight. The incident sparked migration discussions to self-hosted open models like GLM-5 and Qwen3.5 where infrastructure costs are flat regardless of tokenization.
Unsloth's Qwen3.6-35B-A3B GGUF quantizations achieve best KLD-to-size ratio on 21/22 pareto frontier points. Team clarifies that 95% of their frequent re-uploads stem from upstream llama.cpp issues rather than their own errors, citing Gemma 4's four re-uploads as example.
Analysis of Claude 4.7's tokenizer efficiency and associated API costs.
Qwen3.6 with OpenCode successfully implemented row-level security across a multi-service codebase (Rust, TypeScript, Python), demonstrating practical viability for complex code generation tasks. Users report quality comparable to Claude for certain daily-drive use cases despite remaining bugs.
Anthropic launches Claude Design, a new product offering from the Claude AI family. Details on capabilities and target use cases not provided in source.
Tabular foundation models enable in-context molecular property prediction without task-specific fine-tuning, addressing small dataset challenges in drug discovery and chemical engineering. The approach evaluates frozen molecular embeddings and TFMs across pharmaceutical and engineering benchmarks in low- to medium-data regimes.
Qwen3.6-35B-A3B represents the first local model practitioners find genuinely competitive with proprietary APIs for code generation, producing usable output for UI XML and embedded C++ with minimal post-generation fixes. This marks a capability threshold where local deployment overhead becomes worthwhile compared to previous iterations requiring extensive manual correction.
Users report degraded quality in Claude Opus 4.7 for complex reasoning tasks in theoretical math and physics, citing frequent downtime and performance drops compared to version 4.6. Multiple researchers considering switching back to ChatGPT despite previous preference for Claude.
Qwen3.6-35B model successfully builds a complete tower defense game with autonomous bug detection and fixing using MCP screenshot verification. User reports the model identified rendering issues and wave completion bugs independently during development. Demonstrates strong multimodal code generation capabilities with visual feedback integration.
Comprehensive survey of intrinsic interpretability approaches for LLMs that build transparency directly into architectures rather than relying on post-hoc explanations. Categorizes methods into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction.
Qwen3.6-35B-UD at 2-bit K_XL quantization achieves 98.3% tool call success rate across 58 calls while processing 2.7M tokens on 16GB VRAM. Successfully converts research papers to web applications using llama.cpp on consumer laptop hardware. Demonstrates extreme quantization can maintain performance on complex multi-step tasks.
Comparative evaluation shows Bonsai-8B at 1.125 bpw (782 MB) underperforms Gemma-4-2B at 4.8 bpw (1104 MB) despite only 29% size reduction, questioning the value proposition of extreme quantization. Ternary 1.58-bit variant performed even worse while being 33% larger than Gemma at 1477 MB. Suggests aggressive sub-2-bit quantization may sacrifice too much capability for modest size gains.
Ternary Bonsai uses 1.58-bit weights {-1, 0, +1} to achieve 9x smaller memory footprint than 16-bit models while outperforming peers in standard benchmarks. Available in 8B, 4B, and 1.7B parameter sizes, it balances extreme compression with improved accuracy over 1-bit predecessors.
Community release of Qwen3.6-35B-A3B Uncensored Aggressive with K_P quantizations, achieving 0/465 refusals with claimed zero capability loss. Based on newer Qwen 3.6 foundation maintaining same MoE architecture as 3.5-35B. Includes Q8_K_P through IQ4 quant formats for local deployment.
Advances sparse autoencoder architectures for mechanistic interpretability by introducing dynamic attention mechanisms. SAEs decompose neural activations into interpretable features, and this work addresses key limitations in existing approaches to improve understanding of model internals for safety and alignment.
Investigates when small transformers make early, irreversible commitments to outputs during forward passes, replicating findings on open-weights models and extending to factual recall tasks. Understanding minimal architectures for planning-like behavior reveals how models perform multi-step reasoning with limited computational resources, advancing mechanistic interpretability.
Anthropic released Auto mode for Claude Code (Opus 4.7, Max tier) and new "xhigh" effort level between high and max for granular reasoning control. Update includes fullscreen TUI rendering, mobile notifications for Remote Control, and Windows/MCP fixes.
Qwen 3.6 35B A3B achieves 187 tokens/sec on RTX 5090 32GB at Q5_K_S quantization with 120K context. Performance benchmark for local inference. Demonstrates practical deployment of mid-size models on consumer hardware.
Qwen 3.6 introduces a preserve_thinking flag that prevents KV cache invalidation by maintaining reasoning context across turns. This improves cache reuse in agent scenarios, reduces token consumption from redundant reasoning, and fixes a template issue that caused cache invalidation in Qwen 3.5.
Anthropic appears to be constructively terminating consumer Claude Max subscriptions through silent service degradation rather than transparent communication, likely pivoting to enterprise-only offerings. The strategy aims to salvage subscription revenue while implementing stricter limits and higher-tier pricing that will drive consumer churn.
MambaSL achieves state-of-the-art time series classification using a single-layer Mamba architecture with TSC-specific modifications. Re-evaluates 20 baselines across all 30 UEA datasets under unified protocol, demonstrating SSMs can excel at time series tasks with minimal architectural complexity.
Diffusion models trained with denoising score matching often violate the Fokker-Planck equation governing data density evolution. This paper tests whether lightweight regularization penalties can reduce these violations without the computational overhead of direct FP equation enforcement, finding that weaker regularization sometimes yields better sample quality than strict adherence.
MinShap modifies Shapley values from cooperative game theory to focus on direct feature effects rather than indirect dependencies, making them suitable for feature selection in non-linear models. The approach adapts attribution methods to the distinct requirements of variable selection with dependent features.
Prolepsis phenomenon: transformers commit to decisions early via task-specific attention heads that sustain the commitment without later correction. Replicates planning-site findings in Gemma 2 2B and Llama 3.2 1B, showing residual-stream methods miss this behavior while causal lens tracing captures it. The same motif appears across different tasks (planning, factual recall) at different network depths.
MoE-FM uses mixture-of-experts to capture complex latent geometries (anisotropy, multimodality) in flow matching for language models. YAN non-autoregressive LM built on MoE-FM matches diffusion quality with faster inference in both Transformer and Mamba architectures.
Qwen3.6-35B-A3B is a sparse MoE model with 35B total parameters and 3B active, released under Apache 2.0. The model matches agentic coding performance of models 10x its active size and includes multimodal perception with thinking and non-thinking modes.
Alibaba released Qwen3.6-35B-A3B, a new open-weights model in the Qwen family now available on Hugging Face. Limited information provided beyond model availability.
GPT-Rosalind is a frontier reasoning model specialized for life sciences research including drug discovery, genomics analysis, protein reasoning, and scientific workflows. Purpose-built for domain-specific scientific acceleration.
🔶 Anthropic 4d ago
★ High Signal
Claude Opus 4.7 delivers 13% improvement on coding benchmarks with enhanced vision for higher-resolution images and new effort controls/task budgets for autonomous development. Powers upgraded Claude Code review tools for long-running software engineering tasks. Introduces task-level resource management for extended autonomous coding workflows.
Anthropic launched Claude Design, a multimodal collaboration product that generates visual outputs including designs, prototypes, and slides alongside Opus 4.7. Expands Claude beyond text into integrated design workflows, competing with specialized design-focused AI tools. Available through Anthropic Labs for Opus 4.7 users.
🔶 Anthropic 4d ago
★ High Signal
Claude Opus 4.7 achieves 87.6% on SWE-bench Verified (13% improvement) with 2x throughput on agentic tasks while maintaining $5/$25 per million token pricing and full 1M context window. The performance gains make it effectively cheaper per task despite unchanged nominal pricing. Higher-resolution vision capabilities included.
OpenAI's Trusted Access for Cyber program provides security firms GPT-5.4-Cyber access and $10M in API grants. Leading enterprises and security vendors join to strengthen global cyber defense using specialized cybersecurity models.
OpenAI will shut down the Sora app on April 26, 2026, and the API on September 24, marking a rare product retreat as competition from Veo 3.1, Kling 3.0, and open alternatives commoditized video generation faster than expected. The shutdown signals Sora's economics became untenable in an increasingly crowded market.
OpenAI released GPT-Rosalind, its first vertical-specific model optimized for biology and drug discovery, achieving 0.751 on BixBench. Available through trusted access to pharma partners with a free research plugin connecting to 50+ scientific tools, marking a strategic shift toward domain-specialized models.
GitHub Copilot adding Claude Opus 4.7 with stronger multi-step task performance and more reliable agentic execution. Launches with promotional 7.5× premium request multiplier until April 30th, replacing Opus 4.5 and 4.6 for Copilot Pro+ users.
Hugging Face transformers adds support for Mistral 4 (119B MoE with 128 experts unifying Instruct, Reasoning, and Devstral), Jina Embeddings v3, and multiple OCR/video models including VidEoMT, UVDoc, and PI0 robotics VLA. Includes quantization, tokenization, and caching speedups with breaking changes.
Coverage of Gemini 3.1 Flash's text-to-speech capabilities and performance characteristics.
DeepMind's Gemini 3.1 Flash TTS introduces granular audio tags for precise control over expressive speech synthesis. Enables directing AI-generated voice with fine-grained attributes for natural, controllable audio generation.
User reports widespread quality degradation across major models (Claude, Gemini, Grok, z.ai) in mid-April 2026, observing ignored instructions, shallow outputs, and slow responses even when testing locally on H100 with GLM-5. Community discussion suggests potential systematic changes, though reports lack controlled verification. May reflect perception issues, A/B testing, or genuine model updates.
MiniMax clarified M2.7 license to explicitly allow personal use for commercial software development without licensing fees. Users can run models on their own servers for coding, building applications/agents, and sell resulting software commercially.
GPT Image 2 rolled out with near-perfect text rendering in images, solving major AI generation weakness. Shows improved prompt adherence and realistic details. Discovered through anonymous "tape" codenames on Arena AI before official announcement.
Mistral's Voxtral is a 4B-parameter multilingual TTS model supporting 9 languages with emotionally expressive generation, low-latency streaming, and custom voice adaptation. Available via Mistral Studio and API, it targets enterprise voice agent workflows with focus on natural rhythm and cultural authenticity.
Nathan Lambert predicts top closed models show no growing capability margin over open models, but retain robustness advantages for general use. Economic staying power becomes the key competitive dimension, with open models dominating repetitive automation and new funding structures emerging by mid-2026.
HY-World 2.0 generates navigable 3D Gaussian Splatting scenes from text, single images, multi-view images, or videos through a four-stage pipeline including panorama generation, trajectory planning, world expansion, and composition. The framework advances 3D world reconstruction and generation with improved panorama fidelity and 3D scene understanding capabilities.
Three-Phase Transformer (3PT) partitions hidden states into cyclic channels maintained by phase-respecting operations including per-channel normalization and 2D Givens rotations between attention and FFN layers. Creates a self-stabilizing architecture with a DC subspace for absolute position encoding orthogonal to RoPE, representing a structural prior rather than an added module.
Qwen 3.6-35B-A3B generated exceptional community engagement (2,154 upvotes) with practitioners reporting significant capability leaps for local deployment, particularly requiring manual 'preserve_thinking' flag for optimal performance. The mixture-of-experts A3B variant activates only 3B of 35B parameters, enabling consumer hardware deployment with strong tool calling and coding performance.
Community observation that Claude-4.6-Opus fine-tunes of open models consistently underperform base models despite promises of increased reasoning. Testing across multiple models and quantization levels shows decreased intelligence in agent setups. Suggests synthetic data distillation from proprietary models may not reliably transfer capabilities.
KLD evaluation framework for Qwen3.5-9B GGUF quantizations measures probability distribution drift from BF16 baseline rather than perplexity. Provides data-driven quant selection by measuring faithfulness to original weights independent of dataset artifacts.
🧠 DeepMind 6d ago
★ High Signal
Google's Gemini 3 Deep Think achieves 48.4% on Humanity's Last Exam and 84.6% on ARC-AGI-2, now available to Ultra subscribers and select enterprise users. Early adopters use it to identify mathematical paper errors missed by peer review and optimize semiconductor crystal growth. Novel application of specialized reasoning mode to scientific and engineering problems beyond standard benchmarks.
OpenAI launched GPT-5.4-Cyber, a fine-tuned version of GPT-5.4 with lowered guardrails for cybersecurity applications, restricted to authorized security researchers and government agencies due to weaponization concerns. Represents OpenAI's response to Anthropic's Claude Mythos Preview in the AI-assisted cybersecurity race.
OpenAI expands Trusted Access for Cyber program by introducing GPT-5.4-Cyber to vetted defenders while strengthening safeguards as AI cybersecurity capabilities advance. The program provides specialized model access for defensive security applications.
r/LocalLLaMA consensus ranks Qwen 3.5 most broadly recommended, Gemma 4 showing strong buzz, GLM-5/4.7 near top of rankings, MiniMax M2.5/M2.7 for agentic workloads, DeepSeek V3.2 in top cluster. Qwen3-Coder-Next dominates for local coding. Community-driven practical guidance on deployed models.
Independent researcher trained a 1.088B parameter pure Spiking Neural Network for language modeling from random initialization, achieving 4.4 loss and 93% activation sparsity at 27k steps before running out of compute budget. This challenges conventional wisdom that billion-scale SNNs require ANN-to-SNN conversion due to vanishing gradients, demonstrating direct spike-domain training is viable. Cross-lingual emergence appeared around step 25K despite no explicit multilingual objective.
Claude responses shortened 40% and became more restrictive after March 26, with welfare redirects up 275% and productivity dropping by 6x (124 words of conversation per output word vs. 21 previously). User measured 722,522 words across 70 conversations, quantifying the same degradation pattern ChatGPT users experienced.
Community megathread discusses recent local LLM releases including Qwen3.5, Gemma4, GLM-5.1 claiming SOTA performance, Minimax-M2.7 as accessible alternative to Claude Sonnet, and PrismML Bonsai 1-bit models. Users share deployment configurations and real-world usage experiences with open-weight models.
MiniMax's Ryan Lee clarifies restrictive license primarily targets API providers who poorly served M2.1/M2.5 models, with potential updates coming for regular users. Addresses community concerns about model licensing and usage terms. Brief update on evolving open-source licensing policies.
Duplicate announcement of imminent Kimi K2.6 model release with no substantive information.
Gemma 4 26B MoE shows reluctance to use tools or web search, defaulting to internal knowledge and performing minimal searches when explicitly requested. Community feedback on model's agentic capabilities despite strong benchmarks. Highlights gap between stated capabilities and practical tool use.
Meta released Muse Spark, scoring #4 worldwide on the Artificial Analysis Intelligence Index, but as a proprietary model available only through Meta AI app and private API—breaking from their open-weights Llama tradition. The shift marks Meta's first frontier-class release without open weights since founding Meta Superintelligence Labs, leaving the future of the Llama family unclear.
📝 Blog 1w ago
★ High Signal
GLM-5.1 achieves 94.6% of Claude Opus 4.6's coding performance at $3/month under MIT license, while Google's Gemma 4 and Qwen 3.5 deliver frontier-competitive performance. This marks the collapse of the performance gap between open and closed-source models, fundamentally shifting AI economics and deployment patterns.
Byte-Level Distillation (BLD) solves cross-tokenizer distillation by converting teacher output distributions to byte-level probabilities and adding a lightweight byte decoder to the student. This simple approach outperforms complex vocabulary alignment heuristics by operating at the common byte interface shared across all tokenizers.
Meta Muse Spark marks Meta's pivot from open-source to proprietary models, featuring multimodal perception, parallel subagent execution, and a contemplating mode. Built by Meta Superintelligence Labs, it offers competitive vision and language performance but lags in coding, representing Meta's first paid API model after Llama 4's poor reception.
Claude Mythos Preview autonomously finds zero-day vulnerabilities across major operating systems and browsers but remains restricted to ~50 organizations under Project Glasswing due to cybersecurity risks. Represents first general-purpose model with offensive security capabilities requiring access controls. Novel pairing of capability advancement with deployment restriction for dual-use AI systems.
🧠 DeepMind 2w ago
★ High Signal
Gemma 4 family (31B Dense, 26B MoE variants) released under Apache 2.0 with 256K context, native vision/audio, and competitive coding ELO jumping from 110 to 2150—a 20x improvement. The 31B model outperforms models 20x larger while enabling agentic skills on edge devices. First open-weights model family combining multimodal input, extended context, and elite coding performance at edge-deployable scale.
Open models should shift from frontier-chasing to three classes: closed frontier, open frontier, and specialized small models as "distributed intelligence." Advocates cheap, task-specific models that complement closed agents rather than competing at the frontier. Critiques ecosystem obsession with matching GPT-4 scale.
Comprehensive visual reference documenting LLM architectures from GPT-2 through March 2026, including standardized fact sheets, decoder block diagrams, and architectural lineage tracking. Covers recent innovations like DeepSeek V3's MLA and Qwen3.5's Gated DeltaNet hybrid. Available as 182-megapixel poster with source data on GitHub, serving as canonical resource for understanding architectural evolution.
Production testing reveals Gemma 12B and Qwen 3.5 35B return correct answers in unparseable formats despite explicit instructions—Python instead of CSV, Markdown instead of CSV. Format compliance is independent capability missing from all major benchmarks (SWE-bench, Aider, LiveBench, SEAL), critical gap for production pipelines where consumers are parsers not humans. Smaller models fundamentally lack instruction-following precision for machine-readable output.
4.5-hour discussion with Sebastian Raschka, Nathan Lambert, and Lex Fridman covering 2026 AI landscape including inference-time scaling, RLVR, architecture evolution, open vs closed models, AGI timelines, and economic forces shaping development. Comprehensive synthesis of current industry perspectives and technical directions.
Comprehensive taxonomy of inference-time scaling approaches including recursive language models and test-time compute research. Inference scaling has become most effective method for improving deployed LLM answer quality. Technical explainer for understanding modern reasoning model architectures.