@A.Ziegler 在给大家分享一个使用gemini多模态的小发现中发帖在用gemini3检测object并返回检测框坐标时，给出的坐标会有一些偏移，但是直接让他画框又往往非常精准，后来查询了开发文档发现，gemini返回的图像坐标有可能是归一化之后的坐标，需要手动转一下，或者在Prompt里强调一下需要原图的坐标

@A.Ziegler 在给大家分享一个使用gemini多模态的小发现中发帖

在用gemini3检测object并返回检测框坐标时，给出的坐标会有一些偏移，但是直接让他画框又往往非常精准，后来查询了开发文档发现，gemini返回的图像坐标有可能是归一化之后的坐标，需要手动转一下，或者在Prompt里强调一下需要原图的坐标。 

From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. 

Reference：...