最上川 (@artorius) 在腾讯发布了首个 Diffusion 大语言模型 WeDLM-8B 中发帖在数学推理任务中，相比经 vLLM 优化的 Qwen3-8B，速度提升 3–6 倍在大多数基准测试中，性能超越原始的 Qwen3-8B-Instruct 原生支持 KV Cache（兼容 FlashAttention、PagedAttention、CUDA Graphs） [image] [image]

最上川 (@artorius) 在腾讯发布了首个 Diffusion 大语言模型 WeDLM-8B 中发帖

在数学推理任务中，相比经 vLLM 优化的 Qwen3-8B，速度提升 3–6 倍 
在大多数基准测试中，性能超越原始的 Qwen3-8B-Instruct 
原生支持 KV Cache（兼容 FlashAttention、PagedAttention、CUDA Graphs） 
 [image] 
[image]