💬 Reddit 1d ago
RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.
Qwen3.6-35B-A3B achieves 79 t/s with 128K context on RTX 5070 Ti + 9800X3D by using --n-cpu-moe instead of --cpu-moe, delivering 54% speedup. Demonstrates effective MoE offloading strategy for 16GB consumer GPUs with high-cache CPUs.