xscrunsloth Tutorial: How to Run QwQ-32B effectively 中发帖

介绍了如何有效运行QwQ-32B模型,包括推荐设置、问题修复及使用教程。 
Tutorial: How to Run QwQ-32B effectively
Official Recommended Settings
According to Qwen, these are the recommended settings for inference:

Temperature of 0.6
Top_K of 40 (or 20 to 40)
Min_P of 0.02 (optional, but works well, llama.cpp default is 0.1)
Top_P of 0.95
Repetition Penalty of 1.0. (1.0 means disabled in llama.cpp and transformers)
Chat template:...