小诗音 (@mingliao) 在 LiveBench看看DeepSeek-V3什么水平？中发帖今天看到论坛都在说V3出来了，于是本地跑了轮LiveBench [image] 成绩如下： All Groupsmodelaveragereasoningcodingmathdata_analysislanguageifcompanyo1-2024-12-17-high75.6791.5869.6980.3265.4765.3981.55OpenAIo1-preview-2024-09-1265.7967.4250.8565.4967.6968.7274.60OpenAIgemini-exp-120664.0957.0063.4172.3663.1651.2977.34Googlegemini-2.0-flash-thinking-exp-121961.8364.5853.1...

小诗音 (@mingliao) 在 LiveBench看看DeepSeek-V3什么水平？中发帖

今天看到论坛都在说V3出来了，于是本地跑了轮LiveBench 
 [image] 
成绩如下： 
All Groups




model
average
reasoning
coding
math
data_analysis
language
if
company




o1-2024-12-17-high
75.67
91.58
69.69
80.32
65.47
65.39
81.55
OpenAI


o1-preview-2024-09-12
65.79
67.42
50.85
65.49
67.69
68.72
74.60
OpenAI


gemini-exp-1206
64.09
57.00
63.41
72.36
63.16
51.29
77.34
Google


gemini-2.0-flash-thinking-exp-1219
61.83
64.58
53.1...