yeyucca 在 Nemotron 3 Super 120B的kv cache显存占用比 Qwen 3.5 122B轻量3倍中发帖Nemotron-3-Super (100万 Token)：在 16-bit 下只需约 7.63 GB 显存

yeyucca 在 Nemotron 3 Super 120B的kv cache显存占用比 Qwen 3.5 122B轻量3倍中发帖

Nemotron-3-Super (100万 Token)：在 16-bit 下只需约 7.63 GB 显存。
Qwen 3.5 122B (100万 Token)：在 16-bit 下需要约 22.89 GB 显存（是 Nemotron 的 3 倍）。

对生产力更加友好，阿里和国产赶紧追上来啊~ 
1M tokens → 7.63 GiB BF16 / 3.81 GiB FP8 
262k tokens → 2.00 GiB BF16 / 1.00 GiB FP8 
We can almost forget there is a KV cache. 
For comparison, Qwen3.5-122B-A10B has 12 full-attention layers, head-dim 256, and works out to 24,576 bytes/token in B...