- Published on
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
manipulation-capabilitiesopen-sourcelayout-understandingcompositional-generalizationsoftware-commonsensecomputer-use-agent-developmentagentic-capabilitiesnatural-language-instructionsgraphical-user-interfacebenchmarkgrounding-tasksmulti-scale-modelscomputer-use-grounding-datasetmulti-perspective-decoupling
The University of Hong Kong•Salesforce AI Research•
Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to...