🍡 feedmeAI
← All topics
Models 76 items

Everything Models

💬 Reddit 1d ago

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude

Qwen3.6-35B-A3B running at 8-bit quantization with 64k context matches Claude quality for code tasks on consumer hardware (M5 Max, 128GB). Handles complex multi-step research tasks with many tool calls and maintains performance on long context coding tasks. Enables fully local development workflows without sending code to external providers.

📝 Blog 2d ago

Claude Opus 4.7 tokenizer inflation: 35% cost increase hits API users

Claude Opus 4.7's new tokenizer inflates token counts 35-45% for identical inputs (especially code-heavy prompts), causing silent production cost increases despite unchanged "$5/$25 per million tokens" pricing—a $500/day app became $675/day overnight. The incident sparked migration discussions to self-hosted open models like GLM-5 and Qwen3.5 where infrastructure costs are flat regardless of tokenization.

💬 Reddit 2d ago

Qwen3.6 GGUF Benchmarks

Unsloth's Qwen3.6-35B-A3B GGUF quantizations achieve best KLD-to-size ratio on 21/22 pareto frontier points. Team clarifies that 95% of their frequent re-uploads stem from upstream llama.cpp issues rather than their own errors, citing Gemma 4's four re-uploads as example.

💬 Reddit 2d ago

Qwen3.6 is incredible with OpenCode!

Qwen3.6 with OpenCode successfully implemented row-level security across a multi-service codebase (Rust, TypeScript, Python), demonstrating practical viability for complex code generation tasks. Users report quality comparable to Claude for certain daily-drive use cases despite remaining bugs.

🟧 Hacker News 2d ago

Claude Design

Anthropic launches Claude Design, a new product offering from the Claude AI family. Details on capabilities and target use cases not provided in source.

💬 Reddit 2d ago

Qwen 3.6 is the first local model that actually feels worth the effort for me

Qwen3.6-35B-A3B represents the first local model practitioners find genuinely competitive with proprietary APIs for code generation, producing usable output for UI XML and embedded C++ with minimal post-generation fixes. This marks a capability threshold where local deployment overhead becomes worthwhile compared to previous iterations requiring extensive manual correction.

💬 Reddit 2d ago

Qwen3.6. This is it.

Qwen3.6-35B model successfully builds a complete tower defense game with autonomous bug detection and fixing using MCP screenshot verification. User reports the model identified rendering issues and wave completion bugs independently during development. Demonstrates strong multimodal code generation capabilities with visual feedback integration.

📑 arXiv 2d ago

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

Comprehensive survey of intrinsic interpretability approaches for LLMs that build transparency directly into architectures rather than relying on post-hoc explanations. Categorizes methods into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction.

💬 Reddit 2d ago

Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B

Comparative evaluation shows Bonsai-8B at 1.125 bpw (782 MB) underperforms Gemma-4-2B at 4.8 bpw (1104 MB) despite only 29% size reduction, questioning the value proposition of extreme quantization. Ternary 1.58-bit variant performed even worse while being 33% larger than Gemma at 1477 MB. Suggests aggressive sub-2-bit quantization may sacrifice too much capability for modest size gains.

📑 arXiv 3d ago

What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

Investigates when small transformers make early, irreversible commitments to outputs during forward passes, replicating findings on open-weights models and extending to factual recall tasks. Understanding minimal architectures for planning-like behavior reveals how models perform multi-step reasoning with limited computational resources, advancing mechanistic interpretability.

💬 Reddit 3d ago

Only LocalLLaMa can save us now.

Anthropic appears to be constructively terminating consumer Claude Max subscriptions through silent service degradation rather than transparent communication, likely pivoting to enterprise-only offerings. The strategy aims to salvage subscription revenue while implementing stricter limits and higher-tier pricing that will drive consumer churn.

📑 arXiv 3d ago

An Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image Generation

Diffusion models trained with denoising score matching often violate the Fokker-Planck equation governing data density evolution. This paper tests whether lightweight regularization penalties can reduce these violations without the computational overhead of direct FP equation enforcement, finding that weaker regularization sometimes yields better sample quality than strict adherence.

📑 arXiv 3d ago

What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

Prolepsis phenomenon: transformers commit to decisions early via task-specific attention heads that sustain the commitment without later correction. Replicates planning-site findings in Gemma 2 2B and Llama 3.2 1B, showing residual-stream methods miss this behavior while causal lens tracing captures it. The same motif appears across different tasks (planning, factual recall) at different network depths.

💬 Reddit 3d ago

Qwen3.6-35B-A3B released!

Qwen3.6-35B-A3B is a sparse MoE model with 35B total parameters and 3B active, released under Apache 2.0. The model matches agentic coding performance of models 10x its active size and includes multimodal perception with thinking and non-thinking modes.

🔶 Anthropic 4d ago
★ High Signal

Claude Opus 4.7 - Major Model Release

Claude Opus 4.7 delivers 13% improvement on coding benchmarks with enhanced vision for higher-resolution images and new effort controls/task budgets for autonomous development. Powers upgraded Claude Code review tools for long-running software engineering tasks. Introduces task-level resource management for extended autonomous coding workflows.

📝 Blog 4d ago

OpenAI Sora Shutdown: Video Model to Cease Operations

OpenAI will shut down the Sora app on April 26, 2026, and the API on September 24, marking a rare product retreat as competition from Veo 3.1, Kling 3.0, and open alternatives commoditized video generation faster than expected. The shutdown signals Sora's economics became untenable in an increasingly crowded market.

🐙 GitHub 4d ago

GitHub Copilot Adds Claude Opus 4.7

GitHub Copilot adding Claude Opus 4.7 with stronger multi-step task performance and more reliable agentic execution. Launches with promotional 7.5× premium request multiplier until April 30th, replacing Opus 4.5 and 4.6 for Copilot Pro+ users.

💬 Reddit 4d ago

Major drop in intelligence across most major models.

User reports widespread quality degradation across major models (Claude, Gemini, Grok, z.ai) in mid-April 2026, observing ignored instructions, shallow outputs, and slow responses even when testing locally on H100 with GLM-5. Community discussion suggests potential systematic changes, though reports lack controlled verification. May reflect perception issues, A/B testing, or genuine model updates.

📝 Blog 5d ago

Mistral Voxtral TTS Model

Mistral's Voxtral is a 4B-parameter multilingual TTS model supporting 9 languages with emotionally expressive generation, low-latency streaming, and custom voice adaptation. Available via Mistral Studio and API, it targets enterprise voice agent workflows with focus on natural rhythm and cultural authenticity.

📝 Blog 5d ago

My bets on open models, mid-2026

Nathan Lambert predicts top closed models show no growing capability margin over open models, but retain robustness advantages for general use. Economic staying power becomes the key competitive dimension, with open models dominating repetitive automation and new funding structures emerging by mid-2026.

🤗 Hugging Face 5d ago

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

HY-World 2.0 generates navigable 3D Gaussian Splatting scenes from text, single images, multi-view images, or videos through a four-stage pipeline including panorama generation, trajectory planning, world expansion, and composition. The framework advances 3D world reconstruction and generation with improved panorama fidelity and 3D scene understanding capabilities.

🤗 Hugging Face 5d ago

Three-Phase Transformer

Three-Phase Transformer (3PT) partitions hidden states into cyclic channels maintained by phase-respecting operations including per-channel normalization and 2D Givens rotations between attention and FFN layers. Creates a self-stabilizing architecture with a DC subspace for absolute position encoding orthogonal to RoPE, representing a structural prior rather than an added module.

💬 Reddit 5d ago

Qwen 3.6-35B-A3B Release Generates Major Community Buzz on r/LocalLLaMA

Qwen 3.6-35B-A3B generated exceptional community engagement (2,154 upvotes) with practitioners reporting significant capability leaps for local deployment, particularly requiring manual 'preserve_thinking' flag for optimal performance. The mixture-of-experts A3B variant activates only 3B of 35B parameters, enabling consumer hardware deployment with strong tool calling and coding performance.

🧠 DeepMind 6d ago
★ High Signal

Google Gemini 3 Deep Think - Major Upgrade

Google's Gemini 3 Deep Think achieves 48.4% on Humanity's Last Exam and 84.6% on ARC-AGI-2, now available to Ultra subscribers and select enterprise users. Early adopters use it to identify mathematical paper errors missed by peer review and optimize semiconductor crystal growth. Novel application of specialized reasoning mode to scientific and engineering problems beyond standard benchmarks.

💬 Reddit 6d ago

I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R]

Independent researcher trained a 1.088B parameter pure Spiking Neural Network for language modeling from random initialization, achieving 4.4 loss and 93% activation sparsity at 27k steps before running out of compute budget. This challenges conventional wisdom that billion-scale SNNs require ANN-to-SNN conversion due to vanishing gradients, demonstrating direct spike-domain training is viable. Cross-lingual emergence appeared around step 25K despite no explicit multilingual objective.

💬 Reddit 6d ago

Claude is on the same path as ChatGPT. I measured it.

Claude responses shortened 40% and became more restrictive after March 26, with welfare redirects up 275% and productivity dropping by 6x (124 words of conversation per output word vs. 21 previously). User measured 722,522 words across 70 conversations, quantifying the same degradation pattern ChatGPT users experienced.

💬 Reddit 6d ago

Best Local LLMs - Apr 2026

Community megathread discusses recent local LLM releases including Qwen3.5, Gemma4, GLM-5.1 claiming SOTA performance, Minimax-M2.7 as accessible alternative to Claude Sonnet, and PrismML Bonsai 1-bit models. Users share deployment configurations and real-world usage experiences with open-weight models.

💬 Reddit 6d ago

Ryan Lee from MiniMax posts article on the license stating it's mostly for API providers that did a poor job serving M2.1/M2.5 and may update the license for regular users!

MiniMax's Ryan Lee clarifies restrictive license primarily targets API providers who poorly served M2.1/M2.5 models, with potential updates coming for regular users. Addresses community concerns about model licensing and usage terms. Brief update on evolving open-source licensing policies.

💬 Reddit 6d ago

Gemma 4 - lazy model or am I crazy? (bit of a rant)

Gemma 4 26B MoE shows reluctance to use tools or web search, defaulting to internal knowledge and performing minimal searches when explicitly requested. Community feedback on model's agentic capabilities despite strong benchmarks. Highlights gap between stated capabilities and practical tool use.

📝 Blog 1w ago

Meta's Muse Spark: Breaking with Open Source, Scores #4 Worldwide

Meta released Muse Spark, scoring #4 worldwide on the Artificial Analysis Intelligence Index, but as a proprietary model available only through Meta AI app and private API—breaking from their open-weights Llama tradition. The shift marks Meta's first frontier-class release without open weights since founding Meta Superintelligence Labs, leaving the future of the Llama family unclear.

🔶 Anthropic 1w ago

Claude Mythos Preview - Restricted Cybersecurity Model

Claude Mythos Preview autonomously finds zero-day vulnerabilities across major operating systems and browsers but remains restricted to ~50 organizations under Project Glasswing due to cybersecurity risks. Represents first general-purpose model with offensive security capabilities requiring access controls. Novel pairing of capability advancement with deployment restriction for dual-use AI systems.

🧠 DeepMind 2w ago
★ High Signal

Google Gemma 4 - Open Model Family Release

Gemma 4 family (31B Dense, 26B MoE variants) released under Apache 2.0 with 256K context, native vision/audio, and competitive coding ELO jumping from 110 to 2150—a 20x improvement. The 31B model outperforms models 20x larger while enabling agentic skills on edge devices. First open-weights model family combining multimodal input, extended context, and elite coding performance at edge-deployable scale.

📝 Blog Mar 16

Nathan Lambert: What Comes Next with Open Models

Open models should shift from frontier-chasing to three classes: closed frontier, open frontier, and specialized small models as "distributed intelligence." Advocates cheap, task-specific models that complement closed agents rather than competing at the frontier. Critiques ecosystem obsession with matching GPT-4 scale.

📝 Blog Mar 16

Sebastian Raschka: LLM Architecture Gallery (Updated March 2026)

Comprehensive visual reference documenting LLM architectures from GPT-2 through March 2026, including standardized fact sheets, decoder block diagrams, and architectural lineage tracking. Covers recent innovations like DeepSeek V3's MLA and Qwen3.5's Gated DeltaNet hybrid. Available as 182-megapixel poster with source data on GitHub, serving as canonical resource for understanding architectural evolution.

📝 Blog Mar 8

Format Compliance as Separate Capability: Small Models Lack It

Production testing reveals Gemma 12B and Qwen 3.5 35B return correct answers in unparseable formats despite explicit instructions—Python instead of CSV, Markdown instead of CSV. Format compliance is independent capability missing from all major benchmarks (SWE-bench, Aider, LiveBench, SEAL), critical gap for production pipelines where consumers are parsers not humans. Smaller models fundamentally lack instruction-following precision for machine-readable output.