中文 | English

EdgeTTS Dify Plugin

Description

EdgeTTS is a text-to-speech Dify plugin based on the EdgeTTS API, compatible with the OpenAI API format. It supports multiple Chinese voices, speed control, and audio format output. Generated audio files are saved to the local temporary directory.

Core Features

🎵 Supports multiple Chinese voices (Xiaoxiao, Yunxi, Xiaoyi, Yunjian, etc.)
⚡ Speed control (0.25x - 4.0x)
📁 Multiple audio formats (MP3, WAV, FLAC)
💾 Local file storage (saved to system temporary directory)
🔒 Secure API key management
🚀 OpenAI API format compatible
📊 Real-time processing progress display
✅ Complete parameter validation and error handling

Installation and Configuration

Requirements

Python 3.12+
dify_plugin >= 0.1.0, < 0.2.0
openai >= 1.0.0
requests >= 2.31.0
pydantic >= 2.0.0

Tech Stack

Dify Plugin Framework: Built on the Dify plugin framework
OpenAI Compatible API: Uses OpenAI client library to call EdgeTTS API
Asynchronous Processing: Supports generator-based streaming processing
Data Validation: Uses Pydantic for parameter validation
Error Handling: Complete exception handling and user-friendly error messages

EdgeTTS API Key Acquisition

Visit the EdgeTTS service provider: https://edgettsapi.duckcloud.fun
Register an account and obtain an API Key
Ensure the API Key is compatible with the OpenAI API format

Plugin Installation

Copy the plugin directory to the Dify plugins directory
Enable the EdgeTTS plugin in the Dify management interface
Configure the necessary authentication information

Configuration Instructions

Configure the following parameters in the Dify plugin management interface:

Required Configuration

EdgeTTS API Key: API key obtained from the EdgeTTS service provider
- Type: Encrypted input
- Description: Authentication key compatible with OpenAI API format

Optional Configuration

API Base URL: EdgeTTS API base address
- Default: https://edgettsapi.duckcloud.fun/v1
- Type: Text input
- Description: Customizable EdgeTTS API server address

Usage

Basic Usage

Add the EdgeTTS plugin to your Dify workflow
Enter the text content to be converted
Select the voice model and parameters
Obtain the generated audio file (saved to local temporary directory)

Detailed Parameter Description

Text Content (input_text)

Type: String (Required)
Description: Text content to be converted to speech
Limit: Maximum 5000 characters
Support: Chinese and other supported languages

Voice Model (voice)

Type: Dropdown selection (Optional)
Default: zh-CN-XiaoxiaoNeural
Options:
- : Xiaoxiao (Chinese female voice)
- : Yunxi (Chinese male voice)
- : Xiaoyi (Chinese female voice)
- : Yunjian (Chinese male voice)

TTS Model (model)

Type: Dropdown selection (Optional)
Default: tts-1
Options:
- : Standard quality, fast processing
- : High quality, better audio effect

Speech Speed (speed)

Type: Numeric (Optional)
Default: 1.0
Range: 0.25 - 4.0
Description: 1.0 is normal speed, 0.25 is slowest, 4.0 is fastest

Audio Format (response_format)

Type: Dropdown selection (Optional)
Default: mp3
Options:
- : MP3 format (recommended, good compatibility)
- : WAV format (lossless quality)
- : FLAC format (lossless compression)

Usage Example

Processing Flow

The plugin displays detailed processing progress during execution:

🚀 Starting voice generation...
📝 Text length validation
🎵 Voice model confirmation
⚡ Speed setting confirmation
🔄 Calling EdgeTTS API...
✅ Voice generation successful
📊 Audio size statistics
💾 Saving audio file to local...
🎉 Voice conversion completed!

Troubleshooting

Common Issues

Invalid API Key: Check if the EdgeTTS API Key is correct
Connection timeout: Check network connection and API Base URL
Text too long: Ensure text length does not exceed 5000 characters
Local save failure: Check local disk space and permissions

Error Codes

401: API Key invalid or expired
403: API Key insufficient permissions
404: API endpoint not found
429: API call rate too high
500: Server internal error

Project Structure

Core File Description

manifest.yaml

Defines plugin basic information (name, version, author)
Configures runtime environment (Python 3.12, 2GB memory allocation)
Specifies tool providers and permission settings

provider/edgetts_provider.py

Implements class, inheriting from
Provides credential validation functionality ()
Tests EdgeTTS API connection availability

tools/text_to_speech.py

Implements class, inheriting from
Core TTS conversion logic ( method)
Parameter validation, API calls, audio file saving
Complete error handling and user feedback

Development and Testing

Local Development Environment Setup

Environment Requirements
Install Dependencies
Local Testing

Testing Instructions

: Contains EdgeTTS API connection and functionality tests
Tests cover: parameter validation, API calls, audio generation, error handling
It is recommended to run tests after code modifications to ensure functionality

Debugging Tips

Log Output: The plugin displays detailed processing status during runtime
Parameter Validation: Check if input parameters meet requirements
API Connection: Verify EdgeTTS API Key and Base URL configuration
Local Storage: Check write permissions for the system temporary directory

Plugin Configuration Files

: Plugin metadata and runtime configuration
: Authentication parameters and tool list definition
: Tool parameter configuration and user interface definition

Version Information

Current Version: v0.0.1
Author: wwwzhouhui
Supported Architectures: AMD64, ARM64
Runtime Environment: Python 3.12
Plugin Type: Dify Tool Plugin
Category: Utilities

Changelog

v0.0.1 (2025-08-26)

Initial Release

✨ Complete EdgeTTS text-to-speech functionality
🔧 OpenAI API format compatible
🎵 Support for multiple Chinese voice models (Xiaoxiao, Yunxi, Xiaoyi, Yunjian)
⚡ Speed control (0.25x - 4.0x)
📁 Multi-format audio output (MP3, WAV, FLAC)
💾 Local temporary directory file storage
🔒 Secure API key management
✅ Complete parameter validation and error handling
📊 Real-time processing progress display
🧪 Includes test cases and development documentation

Technical Features

Built on Dify Plugin Framework
Uses generator pattern to support streaming processing
Complete exception handling mechanism
2GB memory allocation for audio processing
Supports maximum 5000 character text input

License

This project follows an open-source license. See the project root directory for specific license information.

Contributing

Welcome to submit Issues and Pull Requests to improve this project.

Contact

Author: wwwzhouhui
EdgeTTS API Service: https://edgettsapi.duckcloud.fun