Zjuwyz (@doomooo) 在 跑了一下 DeepSeek-V3-0324 的 LiveBench 结果 中发帖
等了一天了没人跑,那还是自己来吧
[image]
Model
Organization
Global Average
Reasoning Average
Coding Average
Mathematics Average
Data Analysis Average
Language Average
IF Average
claude-3-7-sonnet-thinking
Anthropic
76.10
87.83
74.54
79.00
74.05
59.93
81.25
o3-mini-2025-01-31-high
OpenAI
75.88
89.58
82.74
77.29
70.64
50.68
84.36
o1-2024-12-17-high
OpenAI
75.67
91.58
69.69
80.32
65.47
65.39
81.55
qw...