What is Claude’s new computer use API all about?
This blog is written by Akshat Virmani at KushoAI. We're building the fastest way to test your APIs. It's completely free and you can sign up here.
While surfing through the internet, I came across a video on computer use, which I found really interesting and fascinating. To summarise: Computer use will access your computer and do all the tasks automatically without the user interfering with the mouse or keyboard.
What is Claude's computer use?
"Computer Use" API by Anthropic, introduced with their updated Claude 3.5 model, enables Claude to control and interact with a user’s computer. Through this API (in beta right now), developers can command Claude to perform a wide range of tasks typically requiring human input, like clicking buttons and typing. The API allows Claude to perform repetitive tasks, test applications, and conduct online research by simulating human interaction with software. Anthropic has implemented safeguards to prevent misuse, such as classifiers to detect suspicious activity that might arise from automated actions.
The above video from Anthropic’s YouTube channel shows how Claude and Computers can create a website from scratch with minimal user input or feedback.
Computer use is not just limited to writing code; it can do all your spreadsheet tasks, math problem solving, graduate-level reasoning, visual Q&A, and much more, even better than Gemini and ChatGPT. It also has the highest performance in the Software Engineering benchmark and can almost solve 49-50% of the GitHub issues it encounters.
How does Computer Use work?
Whenever Claude’s computer use is given a task, it prompts itself in an infinite loop, performing different actions, forming results, and performing other actions over it until it solves the original problem.
Other products with a similar concept:
Microsoft Copilot: Integrated into Microsoft 365, this tool uses GPT-4 to automate tasks in apps like Word, Excel, and Teams. It can help with drafting documents, analyzing data, and even conducting internet searches, all within the Microsoft ecosystem, making it suitable for Microsoft users.
Google Gemini: Google’s language models also offer conversational and task-oriented assistance. While Gemini can interact with Google apps, it's still limited in its ability to access third-party applications or broader desktop controls directly.While the tools and platforms mentioned above are advanced and beneficial, they cannot be compared to computer use because these products are mainly limited to their products and ecosystems. Claude’s computer can access most third-party applications. However, a highlight point is that Anthropic compares their computer use to models like GPT4o, not the o1 model, which could be more advanced.
To Conclude
Computer use is not perfect; there are many crashes, and sometimes, the results are not what the user is looking for, but they will get better in the upcoming time. Computer use should not be used for tasks like investments or tasks that can misuse a user’s private information, such as financial or health details, and try to run it in a safe sandboxed environment like Docker. All the user needs is a Computer use Anthropic API key and Docker installed.
This blog is written by Akshat Virmani at KushoAI. We're building an AI agent that tests your APIs for you. Bring in API information and watch KushoAI turn it into fully functional and exhaustive test suites in minutes.
Subscribe to my newsletter
Read articles from Sakshi from KushoAI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by