🍡 feedmeAI
← All topics
Open Weights 9 items

Everything Open Weights

💬 Reddit Apr 22

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

Qwen3 TTS achieves real-time local inference with notably expressive output, integrated into the open-source Persona Engine project (ASR→LLM→TTS pipeline with lip-synced avatar). The author positions it as a meaningful step up from prior local TTS options like Sesame for latency-sensitive, fully offline deployments.

💬 Reddit Apr 22

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Pairing Qwen3.6-35B with the 'little-coder' agent scaffold achieves 78.7% on the Polyglot coding benchmark, landing in the public top 10 and competitive with leading cloud models. The same scaffold previously lifted a 9B Qwen model from 19.11% to 45.56%, suggesting a significant portion of the local-vs-cloud performance gap is attributable to scaffold/harness mismatch rather than model capability alone.

🟢 OpenAI Apr 22

Introducing OpenAI Privacy Filter

OpenAI releases an open-weight PII detection and redaction model called Privacy Filter, claiming state-of-the-art accuracy on identifying personally identifiable information in text. Open weights make it deployable on-prem or in air-gapped environments where sending data to an API is not viable. Directly relevant for enterprise pipelines that need PII scrubbing before feeding data to LLMs.

📝 Blog Apr 18
⭐ Editor's Pick

My Workflow for Understanding LLM Architectures

Raschka documents a three-step process for reverse-engineering open-weight model architectures: start with the technical report, cross-reference the HuggingFace config, then validate against the transformers reference implementation. The core argument is that working code is a more reliable source of truth than under-specified papers. Practical guidance for engineers who want to understand architectural nuances firsthand.

📝 Blog Apr 17

Practitioner post: Qwen3.6.35B-A3B MoE outperforms Claude Opus 4.7 locally on MacBook Pro at 20.9 GB quantized

Alibaba's Qwen3 6.35B-A3B MoE (35B total, 3B active parameters) reportedly matches or beats Claude Opus 4.7 on local tasks while fitting in 20.9 GB of quantized RAM on a MacBook Pro. If the benchmark methodology holds, this is a notable MoE-for-edge result: frontier-tier quality within consumer-RAM constraints. Practitioner claim; independent verification of benchmark methodology still needed.

💬 Reddit Apr 16

Qwen3.6-35B-A3B released!

Qwen3.6-35B-A3B is a sparse MoE model with 35B total and only 3B active parameters, released under Apache 2.0. Claims agentic coding performance on par with models 10× its active size, with both multimodal thinking and non-thinking modes. Efficient active-parameter footprint makes it practical for inference on constrained hardware.

📝 Blog Apr 14

r/LocalLLaMA April 2026 community consensus: Qwen 3.5 most recommended family; Qwen3-Coder-Next sweeps local coding

April 2026 r/LocalLLaMA community consensus (143+ posts) names Qwen 3.5 as the most broadly recommended local model family, with Qwen3-Coder-Next as the near-unanimous pick for coding. MiniMax M2.5/M2.7 surface as the go-to for agentic/tool-heavy workloads; Gemma 4 gains traction for general local use; GLM-5/4.7 enters the best-overall conversation.

📝 Blog Mar 16

What Comes Next with Open Models

Lambert argues the open-closed performance gap will widen in 2026 because closed models are accumulating advantages on long-horizon, domain-specific tasks with non-public training data. Proposes a three-class taxonomy: true closed frontier, open frontier, and small specialized open models. Predicts the highest-impact open models will be narrow, fast, cheap sub-agents used as tools inside closed-model pipelines.