Sber Salute Speech Plugin

Author: RaftDS
Version: 1.0.0
Type: Tool Plugin
Contact: GitHub

Description

The SBER Salute Speech Plugin is a comprehensive text-to-speech and speech-to-text conversion tool that integrates with SBER's Salute Speech API. This plugin provides high-quality speech synthesis and recognition capabilities for Russian and English languages, making it ideal for applications requiring natural language processing and audio conversion.

Features

Text-to-Speech (TTS)

Multi-Voice Support: 8 different Russian voices and 2 English voices
Quality Options: 24kHz and 8kHz audio quality options
Character Limit: Supports up to 5000 characters per request
Real-time Synthesis: Fast audio generation with minimal latency

Speech-to-Text (STT)

Multi-language Recognition: Russian and English speech recognition
Multiple Audio Formats: Support for PCM, OPUS, MP3, FLAC, A-Law, μ-Law
Advanced Features:
- Speaker separation and identification
- Profanity filtering
- Multiple recognition hypotheses
- Custom timeout settings
- Recognition hints for improved accuracy
Call Center Optimization: Specialized models for call center applications
Customer Satisfaction Analysis: Built-in CSI (Customer Satisfaction Index) models

Installation

Prerequisites

Python 3.12 or higher
Valid Sber Salute Speech API credentials
Russian CA certificate (required for API access)

Step 1: Install Dependencies

Step 2: Get Russian CA Certificate

The SBER API requires a Russian certificate for authentication. Download it using:

Step 3: Configure API Credentials

Get your Salute Speech
Enter the API key in the plugin configuration interface

Configuration

Environment Variables

: Path to the Russian CA certificate file

API Credentials

Authorization Key: Your Sber Salute Speech API key

Usage Examples

Text-to-Speech Example

Text Parameter:

Voice Parameter:

Speech-to-Text Example

Audio File Parameter:

Language Parameter:

Audio Encoding Parameter:

Sample Rate Parameter:

Supported Voices

Russian Voices

Natalia (Nec_24000, Nec_8000): Female voice, clear pronunciation
Boris (Bys_24000, Bys_8000): Male voice, professional tone
Marfa (May_24000, May_8000): Female voice, warm tone
Taras (Tur_24000, Tur_8000): Male voice, authoritative
Alexandra (Ost_24000, Ost_8000): Female voice, friendly
Sergey (Pon_24000, Pon_8000): Male voice, conversational

English Voices

Kira (Kin_24000, Kin_8000): Female voice, clear English pronunciation

Audio Format Support

Input Formats (STT)

PCM 16-bit Little Endian
OPUS
MP3
FLAC
A-Law
μ-Law

Output Formats (TTS)

WAV (PCM)

Advanced Features

Speaker Separation

Identify and separate multiple speakers in audio
Configure maximum number of speakers (1-10)
Focus on main speaker only option

Recognition Hints

Provide context words to improve recognition accuracy
Enable letter recognition for better short word processing

Customer Satisfaction Analysis

Customer Satisfaction Index (CSI) assessment
Call feature analysis
Issue resolution tracking

Technical Details

API Integration

The plugin uses SBER's Salute Speech API with proper Russian certificate authentication. All requests are made through secure HTTPS connections with appropriate timeout handling.

Performance

TTS Response Time: Typically 2-5 seconds for standard text
STT Processing: Real-time processing with configurable timeouts
Concurrent Requests: Supports multiple simultaneous requests

Error Handling

Comprehensive error handling for API failures
Graceful degradation for network issues
Detailed error messages for debugging

Troubleshooting

Common Issues

Russian CA Certificate Error
API Authentication Error
Audio Format Issues

Performance Optimization

Use 24kHz voices for better quality when bandwidth allows
Enable speaker separation only when needed
Use recognition hints for domain-specific vocabulary

Privacy and Security

This plugin processes audio and text data through SBER's secure API. No data is permanently stored on our servers. For detailed information, see PRIVACY.md [blocked].

Support

For issues, feature requests, or questions:

GitHub Issues: Create an issue
Documentation: Sber Salute Speech API Docs

License

This plugin is provided as-is for use with the Dify platform.

Changelog

Version 1.0.0

Initial release
Text-to-speech functionality
Speech-to-text functionality
Multi-language support (Russian/English)
Speaker separation features
Customer satisfaction analysis