小诗音 (@mingliao) 在 LiveBench看看DeepSeek-V3什么水平? 中发帖
今天看到论坛都在说V3出来了,于是本地跑了轮LiveBench
[image]
成绩如下:
All Groups
model
average
reasoning
coding
math
data_analysis
language
if
company
o1-2024-12-17-high
75.67
91.58
69.69
80.32
65.47
65.39
81.55
OpenAI
o1-preview-2024-09-12
65.79
67.42
50.85
65.49
67.69
68.72
74.60
OpenAI
gemini-exp-1206
64.09
57.00
63.41
72.36
63.16
51.29
77.34
gemini-2.0-flash-thinking-exp-1219
61.83
64.58
53.1...