@A.Ziegler给大家分享一个使用gemini多模态的小发现 中发帖

在用gemini3检测object并返回检测框坐标时,给出的坐标会有一些偏移,但是直接让他画框又往往非常精准,后来查询了开发文档发现,gemini返回的图像坐标有可能是归一化之后的坐标,需要手动转一下,或者在Prompt里强调一下需要原图的坐标。 

From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size.

Reference:...