Shyliulilivebench+aider综合榜单(更新至gpt4.1) 中发帖

参考了这个帖子 
去除了aider上没有的模型,增加了grok和gpt4.1等模型




Model
Global Average
Reasoning Average
Aider Correct
Mathematics Average
Data Analysis Average
Language Average
IF Average




Gemini 2.5 Pro Experimental
79.90
87.53
72.9
89.16
79.89
69.31
80.59


Claude 3.7 Sonnet Thinking
73.94
76.17
64.9
79
74.05
68.27
81.25


o1 High
72.94
77.47
61.7
79.28
65.47
72.15
81.55


o3 Mini High
70.53
74.36
60.4
76.55
7...