Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vercel/ai/llms.txt
Use this file to discover all available pages before exploring further.
Computer Use with Claude
Learn how to integrate Claude’s Computer Use capabilities into your AI SDK applications, enabling AI to interact with computer interfaces.
What is Computer Use?
Computer Use enables AI models to interact with computers like humans do:
- Moving the cursor
- Clicking buttons
- Typing text
- Taking screenshots
- Reading screen content
This opens up possibilities for automating complex tasks while leveraging Claude’s reasoning abilities.
Prerequisites
- Node.js 18+
- Anthropic API key or Vercel AI Gateway access
- A controlled environment for execution (VM or container recommended)
- Understanding of AI safety considerations
Safety First
Computer Use is a beta feature with important safety considerations:
- Use a dedicated virtual machine or container
- Limit access to sensitive data
- Implement human oversight for critical actions
- Restrict internet access to allowlisted domains
- Start with low-risk tasks
Installation
pnpm add ai @ai-sdk/anthropic
How It Works
- Provide tools: Define Computer Use tools (computer, bash, text editor)
- Model selects tool: Claude determines which tool to use
- Execute action: Your code runs the tool (screenshot, click, etc.)
- Return results: Results are sent back to Claude
- Iterate: Claude continues until task is complete
Enables mouse and keyboard control:
import { anthropic } from '@ai-sdk/anthropic';
const computerTool = anthropic.tools.computer_20250124({
displayWidthPx: 1920,
displayHeightPx: 1080,
execute: async ({ action, coordinate, text }) => {
switch (action) {
case 'screenshot': {
return {
type: 'image',
data: await getScreenshot(), // Your implementation
};
}
case 'mouse_move':
case 'left_click':
case 'right_click':
case 'middle_click':
case 'double_click':
case 'type':
case 'key':
case 'cursor_position': {
return await executeComputerAction(action, coordinate, text);
}
}
},
toModelOutput({ output }) {
return typeof output === 'string'
? [{ type: 'text', text: output }]
: [{ type: 'image', data: output.data, mediaType: 'image/png' }];
},
});
Executes shell commands:
const bashTool = anthropic.tools.bash_20250124({
execute: async ({ command, restart }) => {
// Your implementation
return execSync(command).toString();
},
});
Text Editor Tool
Handles file operations:
const textEditorTool = anthropic.tools.textEditor_20250124({
execute: async ({
command,
path,
file_text,
insert_line,
new_str,
insert_text,
old_str,
view_range,
}) => {
// Your implementation
return executeTextEditorFunction({
command,
path,
fileText: file_text,
insertLine: insert_line,
newStr: new_str,
insertText: insert_text,
oldStr: old_str,
viewRange: view_range,
});
},
});
Basic Example
One-Shot Generation
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt: 'Move the cursor to the center of the screen and take a screenshot',
tools: { computer: computerTool },
});
console.log(result.text);
Streaming Generation
import { streamText } from 'ai';
const result = streamText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt: 'Open the browser and navigate to vercel.com',
tools: { computer: computerTool },
});
for await (const chunk of result.textStream) {
console.log(chunk);
}
Multi-Step (Agentic) Usage
Enable autonomous multi-step execution:
import { streamText, stepCountIs } from 'ai';
const result = streamText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt: 'Find the search bar, search for "AI SDK", and take a screenshot of the results',
tools: { computer: computerTool },
stopWhen: stepCountIs(10), // Allow up to 10 steps
});
The stopWhen parameter allows Claude to:
- Take a screenshot to see the screen
- Move the cursor to the search bar
- Click the search bar
- Type the search query
- Press Enter
- Wait for results to load
- Take a final screenshot
- Respond with findings
Use all three tools together for complex workflows:
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt: `Create a file called example.txt, write 'Hello World' to it,
and run 'cat example.txt' to verify`,
tools: {
computer: computerTool,
bash: bashTool,
str_replace_editor: textEditorTool,
},
stopWhen: stepCountIs(15),
});
console.log(result.text);
Implementation Example
Here’s a more complete implementation:
import { exec } from 'child_process';
import { promisify } from 'util';
import screenshot from 'screenshot-desktop';
const execAsync = promisify(exec);
export async function getScreenshot(): Promise<string> {
const img = await screenshot();
return img.toString('base64');
}
export async function executeComputerAction(
action: string,
coordinate?: [number, number],
text?: string,
): Promise<string> {
switch (action) {
case 'mouse_move':
if (!coordinate) throw new Error('Coordinate required');
// Use your preferred automation library (e.g., robotjs, nut.js)
await moveMouse(coordinate[0], coordinate[1]);
return 'Mouse moved';
case 'left_click':
await click('left');
return 'Left click executed';
case 'type':
if (!text) throw new Error('Text required');
await typeText(text);
return 'Text typed';
// Implement other actions...
default:
throw new Error(`Unknown action: ${action}`);
}
}
Best Practices
Clear Instructions
const result = await generateText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt: `
1. Take a screenshot to see the current state
2. Find the blue "Submit" button
3. Click it
4. Wait 2 seconds
5. Take another screenshot to verify
`,
tools: { computer: computerTool },
});
Verify Actions with Screenshots
prompt: `After each action, take a screenshot to verify it worked correctly`
Use Keyboard Shortcuts
prompt: `Use Cmd+C to copy instead of right-clicking the context menu`
Provide Context
system: `You are automating a checkout process.
The "Checkout" button is typically in the top-right corner.
Wait for page loads before taking actions.`
Next.js Integration
For a complete Next.js example:
import { streamText, stepCountIs } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { computerTool } from '@/lib/computer-tools';
export async function POST(req: Request) {
const { prompt } = await req.json();
const result = streamText({
model: 'anthropic/claude-sonnet-4-20250514',
prompt,
tools: { computer: computerTool },
stopWhen: stepCountIs(20),
});
return result.toUIMessageStreamResponse();
}
Security Checklist
Limitations
Be aware of current limitations:
- May struggle with complex UI interactions
- Can be slow for multi-step tasks
- Screenshot quality affects performance
- Some actions may fail and require retry logic
- Best for structured, repeatable tasks
Example Use Cases
- Browser Automation: Navigate websites and extract information
- Testing: Automated UI testing of web applications
- Data Entry: Fill forms based on structured data
- Documentation: Generate screenshots for guides
- Monitoring: Check application states periodically
Next Steps
Resources