Agent Skills
Vita AI agents operate through a skill-based architecture. Each agent type is configured with a specific set of skills, and the platform dynamically assembles the corresponding tools at runtime. This enables agents to perform real-world work — writing and executing code, interacting with web applications, generating documents, and communicating via email — all within isolated sandbox environments.
Built-in Skills
Full desktop interaction within isolated sandbox environments:
- Screenshot — Capture the current desktop view for vision-based reasoning
- Click & Type — Interact with UI elements by clicking at coordinates and typing text
- Keyboard — Press keys and key combinations for shortcuts and navigation
- Mouse Move & Drag — Move the cursor and perform drag operations
- Scroll — Scroll in any direction within applications
Screenshots are streamed to the UI in real-time, and the agent uses multi-modal vision to reason about what it sees on screen.
Natural-language-driven web interaction powered by the Model Context Protocol (MCP):
- Intent-Based Actions — Agents interact with web pages using natural language descriptions (e.g., "click the submit button") rather than brittle CSS selectors or XPath queries, making automations resilient to UI changes
- Dynamic Tool Discovery — Browser tools are discovered at runtime from the MCP server, so new capabilities become available without code changes
- Multi-Modal — Browser screenshots are returned directly to the agent for visual reasoning, enabling it to understand and react to page state
A pluggable artifact system for creating and editing documents:
- Code — Generate and execute Python, JavaScript, or TypeScript with live output
- Text — Create Markdown documents with real-time streaming preview
- Spreadsheet — Generate structured CSV data with meaningful headers
Documents are streamed to the UI as they're generated and persisted for later use.
Agents can send and receive emails with a unique identity per task:
- Send — Compose and send emails to external recipients
- List — Check the inbox for incoming messages
- Read — Read full email content and attachments
Each agent task is assigned a dedicated email address, enabling agents to communicate with stakeholders, send reports, and receive replies autonomously.
Isolated Linux environments for safe code execution:
- Bash — Execute terminal commands with full shell access
- File System — Read, write, and manage files within the sandbox
- Network — Access external APIs and services
- Code Execution — Run Python, Node.js, and shell scripts
Sandboxes are created on-demand and isolated per task, ensuring agents can execute code safely without affecting other workloads.
How Agents Use Skills
Each built-in agent type comes with a curated set of skills pre-installed and ready to use. Different agent types are suited to different workflows — for example, a QA agent has browser and computer skills for testing web applications, while a general-purpose agent has computer and email skills for broad task automation.
At runtime, the platform assembles only the tools that match the agent's configured skills. This keeps each agent focused and efficient — agents only load the capabilities they need for their role.
Extensibility & Roadmap
Vita's architecture is designed for extensibility at multiple levels:
- MCP Protocol — The platform uses the Model Context Protocol standard for pluggable tool integration. Currently powering browser automation, the same pattern extends to any MCP-compliant server — databases, monitoring tools, APIs, and more.
- Custom Agents (planned) — Users will be able to create custom agent types and configure which skills are installed, tailoring agents to their specific workflows.
- 3rd Party Skills via OpenSkills (planned) — Integration with OpenSkills, a universal skills loader for AI agents, will let users install skills from GitHub repositories or other sources. Skills are simple markdown-based packages that can be installed per-project or globally — no server required.