小诗音 (@mingliao) 在 跑个livebench看看yi-lightning模型究竟什么水平? 中发帖
相信很多佬友早就不信任lmsys的主观排名了。跑了个LiveBench,yi-lightning表现如下:
Model
Global Average
Reasoning Average
Coding Average
Mathematics Average
Data Analysis Average
Language Average
IF Average
o1-preview-2024-09-12
66.02
68.00
50.85
62.92
63.97
72.66
77.72
claude-3-5-sonnet-20240620
59.80
58.67
60.85
53.32
56.74
56.94
72.30
o1-mini-2024-09-12
59.09
77.33
48.05
59.22
54.07
45.72
70.17
gpt-4o-2024-0...