Computer Use with Claude
Learn how to integrate Claude’s Computer Use capabilities into your AI SDK applications, enabling AI to interact with computer interfaces.What is Computer Use?
Computer Use enables AI models to interact with computers like humans do:- Moving the cursor
- Clicking buttons
- Typing text
- Taking screenshots
- Reading screen content
Prerequisites
- Node.js 18+
- Anthropic API key or Vercel AI Gateway access
- A controlled environment for execution (VM or container recommended)
- Understanding of AI safety considerations
Safety First
Computer Use is a beta feature with important safety considerations:- Use a dedicated virtual machine or container
- Limit access to sensitive data
- Implement human oversight for critical actions
- Restrict internet access to allowlisted domains
- Start with low-risk tasks
Installation
How It Works
- Provide tools: Define Computer Use tools (computer, bash, text editor)
- Model selects tool: Claude determines which tool to use
- Execute action: Your code runs the tool (screenshot, click, etc.)
- Return results: Results are sent back to Claude
- Iterate: Claude continues until task is complete
Available Tools
Computer Tool
Enables mouse and keyboard control:Bash Tool
Executes shell commands:Text Editor Tool
Handles file operations:Basic Example
One-Shot Generation
Streaming Generation
Multi-Step (Agentic) Usage
Enable autonomous multi-step execution:stopWhen parameter allows Claude to:
- Take a screenshot to see the screen
- Move the cursor to the search bar
- Click the search bar
- Type the search query
- Press Enter
- Wait for results to load
- Take a final screenshot
- Respond with findings
Combining Multiple Tools
Use all three tools together for complex workflows:Implementation Example
Here’s a more complete implementation:Best Practices
Clear Instructions
Verify Actions with Screenshots
Use Keyboard Shortcuts
Provide Context
Next.js Integration
For a complete Next.js example:Security Checklist
- Running in isolated container/VM
- No access to production databases
- No access to credentials or secrets
- Internet access restricted to allowlist
- Human approval for critical actions
- Logging all computer actions
- Rate limiting enabled
- Automatic timeout after inactivity
Limitations
Be aware of current limitations:- May struggle with complex UI interactions
- Can be slow for multi-step tasks
- Screenshot quality affects performance
- Some actions may fail and require retry logic
- Best for structured, repeatable tasks
Example Use Cases
- Browser Automation: Navigate websites and extract information
- Testing: Automated UI testing of web applications
- Data Entry: Fill forms based on structured data
- Documentation: Generate screenshots for guides
- Monitoring: Check application states periodically
Next Steps
- Review Anthropic’s reference implementation
- Check out the AI SDK Computer Use Template
- Read about multi-modal tool results
- Learn about multi-step calls