Trying Out AI Agents: Less Cowboy, More Safety Goggles

Over the past several weeks, many AI LLM providers have released or announced pending releases of AI agents that can take control of your computer and perform complex tasks that can access multiple tools and systems.

The following are some examples of what is available and coming soon:
– Claude’s Computer Use by Anthropic
– Copilot Studio by Microsoft
– Operator by OpenAI
– Project Jarvis by Google
– Agentforce by Salesforce

As an example, say you want to see what trends, if any, may be driving the orders placed recently with your company. You may want to understand this to position your company better and take advantage of ongoing trends. You could use an agent to gather the information you need quickly.

Your request may look something like this: “Using our orders summary report for last month, find the 10 organizations that placed the largest orders, and search the web using Chrome and Google for any recent news on each of those customers, then put the results into a spreadsheet listing the customers, the order summaries for each customer and links to the news article found for each customer.”

As agents become more capable, you can refine your request so that the agent only includes articles highlighting trends driving your customers’ business.

This simple example gives you an idea of some of the powerful things these agents will soon be able to do. However, there is a downside you should be aware of as you experiment. Handing over control of your computer to an outside agent that is unknown and untested can create problems, including privacy and security issues.

While I highly encourage you to explore what these tools can do, I urge you to do so safely. So, I suggest you create a “safe sandbox” to contain the agent. This sandbox could be as simple as setting up a non-administrative account on your computer that grants minimal access to the core system and data. Only allow demo or test data access if possible; otherwise, do not grant the ability to delete or update live data. Any browsers used should be free of personal search history and cookies, and e-mail accounts used should be restricted test accounts with restrictions on sending messages. A more flexible approach is to use a container like Docker or Kubernetes, which automatically creates a stripped-down and restricted image on your computer. These containers can quickly be recreated and are portable to other computers for increased testing flexibility.

Always assume that the agent will make mistakes and possibly break something, so move the fine China out of reach first. Happy and safe exploring!