AgentBay Dify Plugin
中文文档 | 日本語
Introduction
AgentBay is a cloud sandbox service from Alibaba Cloud WuYing, providing isolated computing environments (Linux, Windows, Browser, etc.) that enable AI Agents to safely execute code, manipulate files, automate browsers, and perform complex tasks.
For detailed information and usage instructions about AgentBay, please refer to the official website: https://www.aliyun.com/product/agentbay
Why This Plugin:
Traditional AI Agents are limited to text-based interactions and cannot execute actual computational tasks. This plugin integrates AgentBay to give your Dify applications "real action capabilities":
- 🔧 Code Execution: Run Python, Shell scripts in isolated environments for data analysis, file conversion, etc.
- 🌐 Browser Automation: Automate web browsing, form filling, data scraping, and web operations
- 📁 File Operations: Read/write cloud files, process documents and logs
- 🖥️ Desktop Automation: Control Windows applications for RPA workflows
- ☁️ Cloud-Based: All operations run in the cloud with security isolation, no local environment needed
Typical Use Cases:
- Data Analysis Agent: Run Python scripts to analyze user-uploaded data
- Web Scraping Agent: Automatically visit websites to extract information
- Automated Testing Agent: Execute web application automated tests
- Document Processing Agent: Batch convert and process file formats
- RPA Office Agent: Automate repetitive desktop operations
Features
Session Management
- Create various cloud environments (Linux, Browser, Code, Windows, Mobile)
- View and manage all active sessions
- Safely delete sessions and clean up resources
Command & Code Execution
- Execute Shell commands in cloud environments
- Run Python and other programming languages
- Configurable execution timeout
File Operations
- Read and write files
- List directory contents
Browser Automation
- Web navigation, element interaction (click, type, scroll)
- Page screenshots, content extraction
- Element analysis, wait for loading
Desktop UI Automation
- Desktop screenshots
- Mouse clicks, keyboard input
Quick Start
Get API Key
Visit AgentBay Console to get your API key.
Configure Plugin
After installing the plugin in Dify, configure the API Key parameter.
Basic Usage
1. Create Session
2. Perform Operations
Execute commands:
File operations:
Browser automation:
3. Clean Up Resources
Demo Examples
We provide ready-to-use demo examples that showcase the plugin's capabilities. These examples are available in the demo directory.
You can quickly get started by importing these examples into Dify:
- Download the demo file (in Dify DSL format) from the demo directory
- In Dify, go to "Studio" and click "Import DSL File"
- Select the downloaded demo file to import
- Configure the AgentBay API Key in the imported workflow
- Run the demo to see how the plugin works
Tools Reference
session_create
Create a new cloud environment session. Supports 5 environment types: (default), (browser), (development), (Windows), (mobile). Returns session_id for subsequent operations.
session_list
List all active sessions created by this plugin under the current account, including session ID, environment type, etc.
session_delete
Delete specified session and release cloud resources. Recommended to clean up promptly after operations.
command_execute
Execute Shell commands in a session with support for custom working directory and timeout. Suitable for command-line operations in Linux/Windows environments.
code_execute
Run code in a session, supporting Python and other programming languages. Can specify working directory, suitable for data processing, automation scripts, etc.
file_operations
Unified file operation tool, specify operation type via parameter:
- read: Read file content
- write: Write to file (automatically creates non-existent files)
- list: List directory contents
browser_automation
Full-featured browser automation tool, requires environment. Supports multiple operations via parameter:
- navigate: Visit web pages
- click/type: Interact with page elements (locate using CSS selectors)
- scroll: Page scrolling
- screenshot: Capture page screenshots (supports full page or viewport)
- get_content: Get page HTML content
- analyze_elements: Analyze page structure to help find correct selectors
- wait_element/wait: Wait for elements or delays
ui_operations
Desktop UI automation tool, suitable for Windows, browser and other graphical environments. Supports via parameter:
- screenshot: Capture desktop screenshots
- click: Click at specified coordinates
- type: Type text
- key: Press specified keys (e.g., Enter, Tab, etc.)
Use Cases
Web Automation Testing
- Create browser_latest session
- Use browser_automation to navigate and interact
- Screenshot to verify results
- Delete session
Data Processing
- Create code_latest session
- Use file_operations to upload data files
- Use code_execute to run analysis scripts
- Use file_operations to download results
- Delete session
System Operations
- Create linux_latest session
- Use command_execute to run check commands
- Use file_operations to read logs
- Delete session
Links
Contact
Disclaimer
This plugin is provided as a technical tool. Please note the following when using:
- Legal Use: Ensure all operations comply with laws and regulations, and do not use for illegal purposes
- Data Responsibility: Users are responsible for the data they process, and it is recommended to avoid processing sensitive information
- Service Dependency: Must comply with Alibaba Cloud WuYing AgentBay service terms
- Risk Assumption: Usage involves risks such as data loss and service interruption, which users assume at their own risk
- Liability Limitation: Developers are not responsible for any losses caused by usage
Using this plugin indicates agreement to the above terms.