The Qwen folks have released a set of updated models with some very interesting and I think useful characteristics.
"We are open-weighting two MoE models: Qwen3-235B-A22B, a large model with 235 billion total parameters and 22 billion activated parameters, and Qwen3-30B-A3B, a smaller MoE model with 30 billion total parameters and 3 billion activated parameters. Additionally, six dense models are also open-weighted, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, under Apache 2.0 license.
Models Layers Heads (Q / KV) Tie Embedding Context Length
Qwen3-0.6B 28 16 / 8 Yes 32K
Qwen3-1.7B 28 16 / 8 Yes 32K
Qwen3-4B 36 32 / 8 Yes 32K
Qwen3-8B 36 32 / 8 No 128K
Qwen3-14B 40 40 / 8 No 128K
Qwen3-32B 64 64 / 8 No 128K
Models Layers Heads (Q / KV) # Experts (Total / Activated) Context Length
Qwen3-30B-A3B 48 32 / 4 128 / 8 128K
Qwen3-235B-A22B 94 64 / 4 128 / 8 128K"
These models take advantage of several recent developments, including a "thinking tag" so you can control how much thought is applied.
Each of the models goes neck and neck with the state of the art, but there are two models specifically that caught my eye.
Qwen3-0.6b runs acceptably on raspberry pi sized hardware, and support tool use. I'm adding it to Alfie to see if it can be useful.
Qwen3-30b-a3b is an MOE (Mixture of Experts) model with 3B parameters active at any one time. It's very fast, and the 4bit quantized version fits on rtx 3090 4090 5090 gpus.
Alan