@fengchris 在微软推理模型再升级 Phi-4-reasoning-plus 中发帖一些参数： AIME 24AIME 25OmniMathGPQA-DLiveCodeBench (8/1/24–2/1/25)Phi-4-reasoning75.362.976.665.853.8Phi-4-reasoning-plus81.378.081.968.953.1OpenThinker2-32B58.058.0—64.1—QwQ 32B79.565.8—59.563.4EXAONE-Deep-32B72.165.8—66.159.5DeepSeek-R1-Distill-70B69.351.563.466.257.5DeepSeek-R178.770.485.073.062.8o1-mini63.654.8—60.053.8...

@fengchris 在微软推理模型再升级 Phi-4-reasoning-plus 中发帖

一些参数： 





AIME 24
AIME 25
OmniMath
GPQA-D
LiveCodeBench (8/1/24–2/1/25)




Phi-4-reasoning
75.3
62.9
76.6
65.8
53.8


Phi-4-reasoning-plus
81.3
78.0
81.9
68.9
53.1


OpenThinker2-32B
58.0
58.0
—
64.1
—


QwQ 32B
79.5
65.8
—
59.5
63.4


EXAONE-Deep-32B
72.1
65.8
—
66.1
59.5


DeepSeek-R1-Distill-70B
69.3
51.5
63.4
66.2
57.5


DeepSeek-R1
78.7
70.4
85.0
73.0
62.8


o1-mini
63.6
54.8
—
60.0
53.8
...