Sber Salute Speech Plugin
Author: RaftDS
Version: 1.0.0
Type: Tool Plugin
Contact: GitHub
Description
The SBER Salute Speech Plugin is a comprehensive text-to-speech and speech-to-text conversion tool that integrates with SBER's Salute Speech API. This plugin provides high-quality speech synthesis and recognition capabilities for Russian and English languages, making it ideal for applications requiring natural language processing and audio conversion.
Features
Text-to-Speech (TTS)
- Multi-Voice Support: 8 different Russian voices and 2 English voices
- Quality Options: 24kHz and 8kHz audio quality options
- Character Limit: Supports up to 5000 characters per request
- Real-time Synthesis: Fast audio generation with minimal latency
Speech-to-Text (STT)
- Multi-language Recognition: Russian and English speech recognition
- Multiple Audio Formats: Support for PCM, OPUS, MP3, FLAC, A-Law, μ-Law
- Advanced Features:
- Speaker separation and identification
- Profanity filtering
- Multiple recognition hypotheses
- Custom timeout settings
- Recognition hints for improved accuracy
- Call Center Optimization: Specialized models for call center applications
- Customer Satisfaction Analysis: Built-in CSI (Customer Satisfaction Index) models
Installation
Prerequisites
- Python 3.12 or higher
- Valid Sber Salute Speech API credentials
- Russian CA certificate (required for API access)
Step 1: Install Dependencies
Step 2: Get Russian CA Certificate
The SBER API requires a Russian certificate for authentication. Download it using:
Step 3: Configure API Credentials
- Get your Salute Speech
- Enter the API key in the plugin configuration interface
Configuration
Environment Variables
- : Path to the Russian CA certificate file
API Credentials
- Authorization Key: Your Sber Salute Speech API key
Usage Examples
Text-to-Speech Example
Text Parameter:
Voice Parameter:
Speech-to-Text Example
Audio File Parameter:
Language Parameter:
Audio Encoding Parameter:
Sample Rate Parameter:
Supported Voices
Russian Voices
- Natalia (Nec_24000, Nec_8000): Female voice, clear pronunciation
- Boris (Bys_24000, Bys_8000): Male voice, professional tone
- Marfa (May_24000, May_8000): Female voice, warm tone
- Taras (Tur_24000, Tur_8000): Male voice, authoritative
- Alexandra (Ost_24000, Ost_8000): Female voice, friendly
- Sergey (Pon_24000, Pon_8000): Male voice, conversational
English Voices
- Kira (Kin_24000, Kin_8000): Female voice, clear English pronunciation
Audio Format Support
Input Formats (STT)
- PCM 16-bit Little Endian
- OPUS
- MP3
- FLAC
- A-Law
- μ-Law
Output Formats (TTS)
Advanced Features
Speaker Separation
- Identify and separate multiple speakers in audio
- Configure maximum number of speakers (1-10)
- Focus on main speaker only option
Recognition Hints
- Provide context words to improve recognition accuracy
- Enable letter recognition for better short word processing
Customer Satisfaction Analysis
- Customer Satisfaction Index (CSI) assessment
- Call feature analysis
- Issue resolution tracking
Technical Details
API Integration
The plugin uses SBER's Salute Speech API with proper Russian certificate authentication. All requests are made through secure HTTPS connections with appropriate timeout handling.
Performance
- TTS Response Time: Typically 2-5 seconds for standard text
- STT Processing: Real-time processing with configurable timeouts
- Concurrent Requests: Supports multiple simultaneous requests
Error Handling
- Comprehensive error handling for API failures
- Graceful degradation for network issues
- Detailed error messages for debugging
Troubleshooting
Common Issues
-
Russian CA Certificate Error
-
API Authentication Error
-
Audio Format Issues
Performance Optimization
- Use 24kHz voices for better quality when bandwidth allows
- Enable speaker separation only when needed
- Use recognition hints for domain-specific vocabulary
Privacy and Security
This plugin processes audio and text data through SBER's secure API. No data is permanently stored on our servers. For detailed information, see PRIVACY.md [blocked].
Support
For issues, feature requests, or questions:
License
This plugin is provided as-is for use with the Dify platform.
Changelog
Version 1.0.0
- Initial release
- Text-to-speech functionality
- Speech-to-text functionality
- Multi-language support (Russian/English)
- Speaker separation features
- Customer satisfaction analysis