The agent can seamlessly use a browser to perform tasks such as booking tickets, ordering food, reserving tables, and more. The system operates as a separate site on a ChatGPT subdomain, featuring a regular chat interface alongside a browser window. This browser is streamed simultaneously to both the user and the AI agent. Users can take control at any time, and for sensitive actions like payments, manual user intervention is mandatory.
This reminds me of the startup Mighty, which initially developed a cloud-based browser but pivoted to image generation a couple of years ago (now known as Playground). Mighty went through Y Combinator back when Sam Altman was chair of its board of directors, so it's possible OpenAI acquired some of their intellectual property.
The system is powered by CUA (Computer-Using Agent), a new fine-tuned version of GPT-4, which combines reasoning with image understanding. It surpasses Sonnet 3.6 (2024-10-22) in tasks involving computer use. However, OpenAI avoids direct comparisons with Google's equivalent model, likely because the performance gap is smaller. Notably, OpenAI’s presentations are increasingly reminiscent of Apple’s style—they refer to the previous model simply as "Previous SOTA" in their tables, with its name (Sonnet 3.6) only appearing in the footnotes.
While Anthropic and Google demonstrated similar capabilities months earlier, OpenAI was the first to launch a consumer-facing product, highlighting their differing priorities. Operator is already rolling out to Pro users (by the way, did you know the Pro subscription is running at a loss?). Access through the Plus plan and API is expected within a few weeks.
For now, you can access Operator at operator.chatgpt.com (available to Pro users in the US)