ZackWill谷歌开源Diffusion Gemma,可在h100上跑出1000tps 中发帖

[1000035600.jpg] 
Blazing fast inference: By shifting the decode bottleneck from memory-bandwidth to compute, DiffusionGemma generates up to 4x faster token output on dedicated GPUs. (1000+ tokens per second on a single NVIDIA H100, 700+ tokens per second on NVIDIA GeForce RTX 5090).
#一些补充
Diffusion是一种不同于Transformer的模型架构,常用于图片生成领域中。