The Business & Technology Network
Helping Business Interpret and Use Technology
«  
  »
S M T W T F S
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
 
 
 

UFO2 turns your desktop into an agent playground

Tags: microsoft new
DATE POSTED:April 22, 2025
UFO2 turns your desktop into an agent playground

What if automating a desktop wasn’t about scripting click patterns, but about giving your operating system an intelligent team of agents? That’s the core idea behind UFO2, Microsoft’s newest open-source system that pushes beyond current Computer-Using Agents (CUAs) and reinvents automation as a first-class OS abstraction. It turns your desktop into an intelligent control panel where language-driven tasks are executed natively, reliably, and with minimal disruption to your workflow.

Traditional desktop automation tools like RPA systems have always struggled with robustness. A minor change in a UI can wreck an entire script. CUAs tried to address this with large language models and screenshot analysis, but they remained limited by shallow system integration and clunky user experiences. UFO2 flips this model by building from the OS upward. It introduces a multiagent architecture where a central HostAgent coordinates specialized AppAgents for different applications. Each agent speaks the native language of the app via APIs and UI metadata, not just pixels.

UFO2 turns your desktop into an agent playgroundA comparison of (a) existing CUAs and (b) desktop AgentOS UFO2 (Image)

One of UFO2’s key technical innovations is its hybrid action model. Instead of just clicking buttons like a human, each AppAgent can call real APIs when available. This means tasks like exporting a spreadsheet or formatting text are reduced from multi-step GUI dances to a single, atomic function call. The system also speculates ahead—using a single LLM call to plan multiple steps and validating each one live with Windows UI data. This speculative multi-action execution dramatically cuts down on latency without risking correctness.

Isolation without interruption

CUAs typically hijack your desktop, locking the mouse and keyboard during execution. UFO2’s Picture-in-Picture (PiP) mode solves this with a virtual desktop window that runs automation tasks in parallel. The agent does its thing in a sandboxed environment, while you continue working in the main session. It’s seamless, secure, and uses native Windows RDP loopback to maintain session integrity.

UFO2 turns your desktop into an agent playground_02An overview of the architecture of UFO2 (Image)

UFO2 integrates help documentation and execution logs into a retrieval-augmented memory, enriching its prompts with procedural knowledge. Over time, this creates a self-improving agent that gets better at new tasks without retraining. Each AppAgent pulls from documentation, patch notes, and prior runs to make smarter decisions. It is an automation system with memory, not just response generation.

In head-to-head benchmarks against OpenAI’s Operator and other top CUAs, UFO2 consistently outperforms. On the OSWorld-W benchmark, UFO2 reaches a 32.7% success rate using the o1 model—more than doubling Operator’s 14.3%. Its speculative planning reduces action steps by up to 50%. Hybrid control detection (combining UIA APIs and vision parsing) recovers over 25% of previously failed interactions. Simply put, UFO2 isn’t just smarter—it’s systemically better.

Everything is an agent now

Extensibility is baked in. UFO2 allows third-party tools, including other CUAs like Operator, to be wrapped as AppAgents. This means you can integrate specialized copilots or proprietary automation backends into the UFO2 ecosystem without retraining or rewriting code. It also supports a client-server architecture for enterprise deployment, keeping orchestration centralized and user devices light.

The paper outlines future goals, including cross-platform compatibility with macOS and Linux via analogous accessibility APIs, faster response via smaller LLMs, and improved reasoning from dedicated GUI-interaction datasets. But even in its current state, UFO2 represents a new baseline for desktop automation. It is open-source, already outperforming commercial systems, and brings a new level of modularity, reliability, and intelligence to human-computer interaction.

For anyone building the next generation of intelligent agents—or just tired of brittle scripts—UFO2 is available on GitHub along with its documentation.

Featured image credit

Tags: microsoft new