作业君 (@homeworkkun) 在 这Qwen3真纯纯刷分怪 中发帖
好久没跑livebench玩了,今天看了下能跑241125版本题库了,就试着跑了一下玩,用的是硅基流动的Qwen3-8B
coding判分不成功,math也有部分没成功,结果出来的成绩……
Model
Global Average
Reasoning Average
Coding Average
Mathematics Average
Data Analysis Average
Language Average
Instruction Following Average
Gemini 2.0 Flash
61.47
55.25
53.92
65.62
67.55
40.69
85.79
Hunyuan Turbos
60.65
53.33
46.56
61.98
75.49
50.38
76.13
DeepSeek V3
60.45
56.75
61.77
60...