Zjuwyz (@doomooo)跑了一下 DeepSeek-V3-0324 的 LiveBench 结果 中发帖

等了一天了没人跑,那还是自己来吧 
[image]




Model
Organization
Global Average
Reasoning Average
Coding Average
Mathematics Average
Data Analysis Average
Language Average
IF Average




claude-3-7-sonnet-thinking
Anthropic
76.10
87.83
74.54
79.00
74.05
59.93
81.25


o3-mini-2025-01-31-high
OpenAI
75.88
89.58
82.74
77.29
70.64
50.68
84.36


o1-2024-12-17-high
OpenAI
75.67
91.58
69.69
80.32
65.47
65.39
81.55


qw...