ไธญๆ | English
Project Source Code:
EdgeTTS Dify Plugin
Description
EdgeTTS is a text-to-speech Dify plugin based on the EdgeTTS API, compatible with the OpenAI API format. It supports multiple Chinese voices, speed control, and audio format output. Generated audio files are saved to the local temporary directory.
Core Features
- ๐ต Supports multiple Chinese voices (Xiaoxiao, Yunxi, Xiaoyi, Yunjian, etc.)
- โก Speed control (0.25x - 4.0x)
- ๐ Multiple audio formats (MP3, WAV, FLAC)
- ๐พ Local file storage (saved to system temporary directory)
- ๐ Secure API key management
- ๐ OpenAI API format compatible
- ๐ Real-time processing progress display
- โ
Complete parameter validation and error handling
Installation and Configuration
Requirements
- Python 3.12+
- dify_plugin >= 0.1.0, < 0.2.0
- openai >= 1.0.0
- requests >= 2.31.0
- pydantic >= 2.0.0
Tech Stack
- Dify Plugin Framework: Built on the Dify plugin framework
- OpenAI Compatible API: Uses OpenAI client library to call EdgeTTS API
- Asynchronous Processing: Supports generator-based streaming processing
- Data Validation: Uses Pydantic for parameter validation
- Error Handling: Complete exception handling and user-friendly error messages
EdgeTTS API Key Acquisition
- Visit the EdgeTTS service provider: https://edgettsapi.duckcloud.fun
- Register an account and obtain an API Key
- Ensure the API Key is compatible with the OpenAI API format
Plugin Installation
- Copy the plugin directory to the Dify plugins directory
- Enable the EdgeTTS plugin in the Dify management interface
- Configure the necessary authentication information
Configuration Instructions
Configure the following parameters in the Dify plugin management interface:
Required Configuration
- EdgeTTS API Key: API key obtained from the EdgeTTS service provider
- Type: Encrypted input
- Description: Authentication key compatible with OpenAI API format
Optional Configuration
- API Base URL: EdgeTTS API base address
Usage
Basic Usage
- Add the EdgeTTS plugin to your Dify workflow
- Enter the text content to be converted
- Select the voice model and parameters
- Obtain the generated audio file (saved to local temporary directory)
Detailed Parameter Description
Text Content (input_text)
- Type: String (Required)
- Description: Text content to be converted to speech
- Limit: Maximum 5000 characters
- Support: Chinese and other supported languages
Voice Model (voice)
- Type: Dropdown selection (Optional)
- Default: zh-CN-XiaoxiaoNeural
- Options:
- : Xiaoxiao (Chinese female voice)
- : Yunxi (Chinese male voice)
- : Xiaoyi (Chinese female voice)
- : Yunjian (Chinese male voice)
TTS Model (model)
- Type: Dropdown selection (Optional)
- Default: tts-1
- Options:
- : Standard quality, fast processing
- : High quality, better audio effect
Speech Speed (speed)
- Type: Numeric (Optional)
- Default: 1.0
- Range: 0.25 - 4.0
- Description: 1.0 is normal speed, 0.25 is slowest, 4.0 is fastest
Audio Format (response_format)
- Type: Dropdown selection (Optional)
- Default: mp3
- Options:
- : MP3 format (recommended, good compatibility)
- : WAV format (lossless quality)
- : FLAC format (lossless compression)
Usage Example
Processing Flow
The plugin displays detailed processing progress during execution:
- ๐ Starting voice generation...
- ๐ Text length validation
- ๐ต Voice model confirmation
- โก Speed setting confirmation
- ๐ Calling EdgeTTS API...
- โ
Voice generation successful
- ๐ Audio size statistics
- ๐พ Saving audio file to local...
- ๐ Voice conversion completed!
Troubleshooting
Common Issues
- Invalid API Key: Check if the EdgeTTS API Key is correct
- Connection timeout: Check network connection and API Base URL
- Text too long: Ensure text length does not exceed 5000 characters
- Local save failure: Check local disk space and permissions
Error Codes
- 401: API Key invalid or expired
- 403: API Key insufficient permissions
- 404: API endpoint not found
- 429: API call rate too high
- 500: Server internal error
Project Structure
Core File Description
manifest.yaml
- Defines plugin basic information (name, version, author)
- Configures runtime environment (Python 3.12, 2GB memory allocation)
- Specifies tool providers and permission settings
provider/edgetts_provider.py
- Implements class, inheriting from
- Provides credential validation functionality ()
- Tests EdgeTTS API connection availability
tools/text_to_speech.py
- Implements class, inheriting from
- Core TTS conversion logic ( method)
- Parameter validation, API calls, audio file saving
- Complete error handling and user feedback
Development and Testing
Local Development Environment Setup
-
Environment Requirements
-
Install Dependencies
-
Local Testing
Testing Instructions
- : Contains EdgeTTS API connection and functionality tests
- Tests cover: parameter validation, API calls, audio generation, error handling
- It is recommended to run tests after code modifications to ensure functionality
Debugging Tips
- Log Output: The plugin displays detailed processing status during runtime
- Parameter Validation: Check if input parameters meet requirements
- API Connection: Verify EdgeTTS API Key and Base URL configuration
- Local Storage: Check write permissions for the system temporary directory
Plugin Configuration Files
- : Plugin metadata and runtime configuration
- : Authentication parameters and tool list definition
- : Tool parameter configuration and user interface definition
Version Information
- Current Version: v0.0.1
- Author: wwwzhouhui
- Supported Architectures: AMD64, ARM64
- Runtime Environment: Python 3.12
- Plugin Type: Dify Tool Plugin
- Category: Utilities
Changelog
v0.0.1 (2025-08-26)
Initial Release
- โจ Complete EdgeTTS text-to-speech functionality
- ๐ง OpenAI API format compatible
- ๐ต Support for multiple Chinese voice models (Xiaoxiao, Yunxi, Xiaoyi, Yunjian)
- โก Speed control (0.25x - 4.0x)
- ๐ Multi-format audio output (MP3, WAV, FLAC)
- ๐พ Local temporary directory file storage
- ๐ Secure API key management
- โ
Complete parameter validation and error handling
- ๐ Real-time processing progress display
- ๐งช Includes test cases and development documentation
Technical Features
- Built on Dify Plugin Framework
- Uses generator pattern to support streaming processing
- Complete exception handling mechanism
- 2GB memory allocation for audio processing
- Supports maximum 5000 character text input
License
This project follows an open-source license. See the project root directory for specific license information.
Contributing
Welcome to submit Issues and Pull Requests to improve this project.
Contact