小诗音 (@mingliao)LiveBench看看DeepSeek-V3什么水平? 中发帖

今天看到论坛都在说V3出来了,于是本地跑了轮LiveBench 
[image]
成绩如下:
All Groups




model
average
reasoning
coding
math
data_analysis
language
if
company




o1-2024-12-17-high
75.67
91.58
69.69
80.32
65.47
65.39
81.55
OpenAI


o1-preview-2024-09-12
65.79
67.42
50.85
65.49
67.69
68.72
74.60
OpenAI


gemini-exp-1206
64.09
57.00
63.41
72.36
63.16
51.29
77.34
Google


gemini-2.0-flash-thinking-exp-1219
61.83
64.58
53.1...