🍡 feedmeAI
← All topics
Open Weights 34 items

Everything Open Weights

📑 arXiv 1h ago

What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

Transformers make irrevocable decisions before seeing full context, replicating rhyme-planning findings on open-weights models and extending to factual recall. Reveals premature binding mechanisms that limit reasoning—models commit to answers too early. First mechanistic evidence of early commitment across multiple task types.

💬 Reddit 1d ago

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude

Qwen3.6-35B-A3B running at 8-bit quantization with 64k context matches Claude quality for code tasks on consumer hardware (M5 Max, 128GB). Handles complex multi-step research tasks with many tool calls and maintains performance on long context coding tasks. Enables fully local development workflows without sending code to external providers.

📝 Blog 2d ago

Claude Opus 4.7 tokenizer inflation: 35% cost increase hits API users

Claude Opus 4.7's new tokenizer inflates token counts 35-45% for identical inputs (especially code-heavy prompts), causing silent production cost increases despite unchanged "$5/$25 per million tokens" pricing—a $500/day app became $675/day overnight. The incident sparked migration discussions to self-hosted open models like GLM-5 and Qwen3.5 where infrastructure costs are flat regardless of tokenization.

💬 Reddit 2d ago

Qwen 3.6 is the first local model that actually feels worth the effort for me

Qwen3.6-35B-A3B represents the first local model practitioners find genuinely competitive with proprietary APIs for code generation, producing usable output for UI XML and embedded C++ with minimal post-generation fixes. This marks a capability threshold where local deployment overhead becomes worthwhile compared to previous iterations requiring extensive manual correction.

💬 Reddit 2d ago

Qwen3.6. This is it.

Qwen3.6-35B model successfully builds a complete tower defense game with autonomous bug detection and fixing using MCP screenshot verification. User reports the model identified rendering issues and wave completion bugs independently during development. Demonstrates strong multimodal code generation capabilities with visual feedback integration.

💬 Reddit 3d ago

Qwen3.6-35B-A3B released!

Qwen3.6-35B-A3B is a sparse MoE model with 35B total parameters and 3B active, released under Apache 2.0. The model matches agentic coding performance of models 10x its active size and includes multimodal perception with thinking and non-thinking modes.

✍️ Simon Willison 4d ago

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Qwen3.6-35B-A3B running locally outperformed Claude Opus 4.7 on an SVG pelican generation task, demonstrating the narrowing capability gap between quantized open-weight models and proprietary APIs for specific visual generation benchmarks. The comparison highlights increasing viability of local inference despite not reflecting overall model capability.

💬 Reddit 4d ago

Local AI is the best

Community appreciation for local AI deployment emphasizes freedom from censorship, data harvesting, and ability to fine-tune models for personal use cases with complete privacy. Credits llama.cpp developers and open-weight model contributors for enabling on-device inference. Reflects growing preference for self-hosted solutions over cloud APIs.

📝 Blog 5d ago

My bets on open models, mid-2026

Nathan Lambert predicts top closed models show no growing capability margin over open models, but retain robustness advantages for general use. Economic staying power becomes the key competitive dimension, with open models dominating repetitive automation and new funding structures emerging by mid-2026.

💬 Reddit 5d ago

Qwen 3.6-35B-A3B Release Generates Major Community Buzz on r/LocalLLaMA

Qwen 3.6-35B-A3B generated exceptional community engagement (2,154 upvotes) with practitioners reporting significant capability leaps for local deployment, particularly requiring manual 'preserve_thinking' flag for optimal performance. The mixture-of-experts A3B variant activates only 3B of 35B parameters, enabling consumer hardware deployment with strong tool calling and coding performance.

💬 Reddit 5d ago

24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4)

Developer converted Xiaomi 12 Pro smartphone into headless 24/7 LLM inference server running Gemma4 via Ollama with LineageOS, custom thermal management, and battery protection scripts. Uses ~9GB RAM for compute after stripping Android UI, with active cooling triggered at 45°C and charging capped at 80% for longevity. Demonstrates edge deployment of open-weights models on consumer mobile hardware.

💬 Reddit 6d ago

How to Distill from 100B+ to <4B Models

Active community discussion (129 posts) on knowledge distillation techniques for compressing 100B+ parameter models into sub-4B variants suitable for consumer hardware deployment. Represents shift from passive model consumption to creating custom distilled models optimized for edge devices, phones, and lightweight laptops. Enables preserving large model capabilities while meeting resource constraints.

💬 Reddit 6d ago

Best Local LLMs - Apr 2026

Community megathread discusses recent local LLM releases including Qwen3.5, Gemma4, GLM-5.1 claiming SOTA performance, Minimax-M2.7 as accessible alternative to Claude Sonnet, and PrismML Bonsai 1-bit models. Users share deployment configurations and real-world usage experiences with open-weight models.

💬 Reddit 6d ago

Ryan Lee from MiniMax posts article on the license stating it's mostly for API providers that did a poor job serving M2.1/M2.5 and may update the license for regular users!

MiniMax's Ryan Lee clarifies restrictive license primarily targets API providers who poorly served M2.1/M2.5 models, with potential updates coming for regular users. Addresses community concerns about model licensing and usage terms. Brief update on evolving open-source licensing policies.

🧠 DeepMind 2w ago
★ High Signal

Google Gemma 4 - Open Model Family Release

Gemma 4 family (31B Dense, 26B MoE variants) released under Apache 2.0 with 256K context, native vision/audio, and competitive coding ELO jumping from 110 to 2150—a 20x improvement. The 31B model outperforms models 20x larger while enabling agentic skills on edge devices. First open-weights model family combining multimodal input, extended context, and elite coding performance at edge-deployable scale.

📝 Blog Mar 17

Interconnects: The Anthropic vs. DOW Conflict and Impact on Open Models

Interview examining Anthropic's DOW supply chain risk designation and its implications for open models, including funding challenges, widening frontier gaps, and sovereign AI demand. Explores tension between open models as protection against government seizure versus tools governments can use without oversight. Discusses Qwen controversy and nationalization risk under "not your weights, not your mind" framework.

📝 Blog Mar 16

Nathan Lambert: What Comes Next with Open Models

Open models should shift from frontier-chasing to three classes: closed frontier, open frontier, and specialized small models as "distributed intelligence." Advocates cheap, task-specific models that complement closed agents rather than competing at the frontier. Critiques ecosystem obsession with matching GPT-4 scale.