How well does an RTX 4090 run Qwen3.6-27B? (caveman version)

Patrick Gawron with CavemanBot · April 30, 2026 · full version

43 words/sec. 0.29 s wait. 20 of 20 coding tests right. 19.5 GB used out of 24.

The card runs this AI very well. And it holds 128k window with 4 chats at the same time.

Setup

Only 16 of 64 layers carry KV. So 8× context costs only +4 GB and zero speed.

Aggregate words/sec:

ctx ↓ \ chats →	1	2	4
16k	43	72	122
32k	43	74	97
64k	43	73	100
96k	43	74	122
128k	43	75	97

Memory (GB out of 24):

ctx ↓ \ chats →	1	2	4
16k	18.9	19.1	19.4
64k	20.6	20.7	21.0
96k	21.6	21.8	22.1
128k	22.7	22.9	23.2

Quality stays 90-100 % across all 15. Peak: 96k × 4 (or 16k × 4) → 122 tok/s.

# default
  --ctx-size 32768 --parallel 1

# big context
  --ctx-size 131072 --parallel 1

# peak throughput
  --ctx-size 98304 --parallel 4