Hardware-optimization — Topic

💬 Reddit 1d ago

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.

Qwen3.6-35B-A3B achieves 79 t/s with 128K context on RTX 5070 Ti + 9800X3D by using --n-cpu-moe instead of --cpu-moe, delivering 54% speedup. Demonstrates effective MoE offloading strategy for 16GB consumer GPUs with high-cache CPUs.

Inference Deployment Hardware-optimization

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. ↗

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part.