@fengchris 在 微软推理模型再升级 Phi-4-reasoning-plus 中发帖
一些参数:
AIME 24
AIME 25
OmniMath
GPQA-D
LiveCodeBench (8/1/24–2/1/25)
Phi-4-reasoning
75.3
62.9
76.6
65.8
53.8
Phi-4-reasoning-plus
81.3
78.0
81.9
68.9
53.1
OpenThinker2-32B
58.0
58.0
—
64.1
—
QwQ 32B
79.5
65.8
—
59.5
63.4
EXAONE-Deep-32B
72.1
65.8
—
66.1
59.5
DeepSeek-R1-Distill-70B
69.3
51.5
63.4
66.2
57.5
DeepSeek-R1
78.7
70.4
85.0
73.0
62.8
o1-mini
63.6
54.8
—
60.0
53.8
...