app icon
SBER Salute Speech
1.0.0

Comprehensive text-to-speech and speech-to-text conversion tool using SBER's Salute Speech API. Supports Russian and English languages with multiple voices, speaker separation, and advanced audio processing features.

raftds/sber-salute-speech627 installs

Sber Salute Speech Plugin

Author: RaftDS
Version: 1.0.0
Type: Tool Plugin
Contact: GitHub

Description

The SBER Salute Speech Plugin is a comprehensive text-to-speech and speech-to-text conversion tool that integrates with SBER's Salute Speech API. This plugin provides high-quality speech synthesis and recognition capabilities for Russian and English languages, making it ideal for applications requiring natural language processing and audio conversion.

Features

Text-to-Speech (TTS)

  • Multi-Voice Support: 8 different Russian voices and 2 English voices
  • Quality Options: 24kHz and 8kHz audio quality options
  • Character Limit: Supports up to 5000 characters per request
  • Real-time Synthesis: Fast audio generation with minimal latency

Speech-to-Text (STT)

  • Multi-language Recognition: Russian and English speech recognition
  • Multiple Audio Formats: Support for PCM, OPUS, MP3, FLAC, A-Law, μ-Law
  • Advanced Features:
    • Speaker separation and identification
    • Profanity filtering
    • Multiple recognition hypotheses
    • Custom timeout settings
    • Recognition hints for improved accuracy
  • Call Center Optimization: Specialized models for call center applications
  • Customer Satisfaction Analysis: Built-in CSI (Customer Satisfaction Index) models

Installation

Prerequisites

  • Python 3.12 or higher
  • Valid Sber Salute Speech API credentials
  • Russian CA certificate (required for API access)

Step 1: Install Dependencies

Step 2: Get Russian CA Certificate

The SBER API requires a Russian certificate for authentication. Download it using:

Step 3: Configure API Credentials

  1. Get your Salute Speech
  2. Enter the API key in the plugin configuration interface

Configuration

Environment Variables

  • : Path to the Russian CA certificate file

API Credentials

  • Authorization Key: Your Sber Salute Speech API key

Usage Examples

Text-to-Speech Example

Text Parameter:

Voice Parameter:

Speech-to-Text Example

Audio File Parameter:

Language Parameter:

Audio Encoding Parameter:

Sample Rate Parameter:

Supported Voices

Russian Voices

  • Natalia (Nec_24000, Nec_8000): Female voice, clear pronunciation
  • Boris (Bys_24000, Bys_8000): Male voice, professional tone
  • Marfa (May_24000, May_8000): Female voice, warm tone
  • Taras (Tur_24000, Tur_8000): Male voice, authoritative
  • Alexandra (Ost_24000, Ost_8000): Female voice, friendly
  • Sergey (Pon_24000, Pon_8000): Male voice, conversational

English Voices

  • Kira (Kin_24000, Kin_8000): Female voice, clear English pronunciation

Audio Format Support

Input Formats (STT)

  • PCM 16-bit Little Endian
  • OPUS
  • MP3
  • FLAC
  • A-Law
  • μ-Law

Output Formats (TTS)

  • WAV (PCM)

Advanced Features

Speaker Separation

  • Identify and separate multiple speakers in audio
  • Configure maximum number of speakers (1-10)
  • Focus on main speaker only option

Recognition Hints

  • Provide context words to improve recognition accuracy
  • Enable letter recognition for better short word processing

Customer Satisfaction Analysis

  • Customer Satisfaction Index (CSI) assessment
  • Call feature analysis
  • Issue resolution tracking

Technical Details

API Integration

The plugin uses SBER's Salute Speech API with proper Russian certificate authentication. All requests are made through secure HTTPS connections with appropriate timeout handling.

Performance

  • TTS Response Time: Typically 2-5 seconds for standard text
  • STT Processing: Real-time processing with configurable timeouts
  • Concurrent Requests: Supports multiple simultaneous requests

Error Handling

  • Comprehensive error handling for API failures
  • Graceful degradation for network issues
  • Detailed error messages for debugging

Troubleshooting

Common Issues

  1. Russian CA Certificate Error

  2. API Authentication Error

  3. Audio Format Issues

Performance Optimization

  • Use 24kHz voices for better quality when bandwidth allows
  • Enable speaker separation only when needed
  • Use recognition hints for domain-specific vocabulary

Privacy and Security

This plugin processes audio and text data through SBER's secure API. No data is permanently stored on our servers. For detailed information, see PRIVACY.md [blocked].

Support

For issues, feature requests, or questions:

License

This plugin is provided as-is for use with the Dify platform.

Changelog

Version 1.0.0

  • Initial release
  • Text-to-speech functionality
  • Speech-to-text functionality
  • Multi-language support (Russian/English)
  • Speaker separation features
  • Customer satisfaction analysis
CATEGORY
Tool
VERSION
1.0.0
raftds·08/13/2025 04:27 AM
REQUIREMENTS
Tool invocation
Maximum memory
256MB
Maximum storage
1MB