Stevessr 在 microsoft/OmniParser v2 来了 中发帖
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
OmniParser是一种将用户界面屏幕截图解析为结构化且易于理解的元素的综合方法,它大大增强了GPT-4V生成可以在接口相应区域中准确接地的操作的能力。
Omniparser V2将此功能提升到一个新的水平。与其前身相...