ZackWill 在谷歌开源Diffusion Gemma，可在h100上跑出1000tps 中发帖[1000035600.jpg] Blazing fast inference: By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on dedicated GPUs. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090). #一些补充 Diffusion是一种不同于Transformer的模型架构，常用于图片生成领域中

ZackWill 在谷歌开源Diffusion Gemma，可在h100上跑出1000tps 中发帖

[1000035600.jpg] 
Blazing fast inference: By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on dedicated GPUs. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090). 
#一些补充 
Diffusion是一种不同于Transformer的模型架构，常用于图片生成领域中。