Convert audio to text with OpenAI's Audio API, supporting transcription and translation across multiple languages with high accuracy.
Author: lysonober
Version: 0.0.4
Type: Tool
The OpenAI Audio tool is a powerful speech-to-text conversion solution that leverages OpenAI's Audio API to transform audio content into accurate text transcriptions and translations. This tool supports multiple audio formats (mp3, mp4, mpeg, mpga, m4a, wav, webm) and can process files up to 25MB in size. It offers both transcription (keeping the original language) and translation (converting to English) capabilities across a wide range of languages.
The tool integrates three powerful models: GPT-4o Transcribe for high-quality transcription, GPT-4o Mini Transcribe for faster processing, and Whisper-1 for legacy support with additional formatting options. Advanced features include streaming output for real-time transcription with GPT-4o models, timestamp generation at segment or word level with Whisper-1, and multiple output formats including plain text, JSON, SRT, and VTT subtitles.
1️⃣ Today, for content creators and video producers,
2️⃣ when working with hours of interview footage or multilingual content,
3️⃣ they are forced to spend excessive time manually transcribing audio or hiring expensive transcription services,
4️⃣ therefore, the customer needs a way to quickly and accurately convert speech to text while preserving timestamps and supporting multiple languages.
1️⃣ Today, for accessibility specialists and educational institutions,
2️⃣ when creating accessible content for diverse audiences with hearing impairments,
3️⃣ they are forced to navigate complex subtitle creation tools or outsource caption generation,
4️⃣ therefore, the customer needs a way to efficiently generate accurate subtitles in various formats (SRT, VTT) with precise timestamps.
The OpenAI Audio tool has several settings that affect each other. Understanding these relationships will help you get the results you want:
When You Choose Translation Mode
When You Choose Special Output Formats
When You Request Timestamps
When You Enable Streaming
When You Enable Timestamps with Whisper-1
The OpenAI Audio tool communicates with OpenAI's Audio API endpoints:
The tool handles various file input methods, creates temporary files for processing, and manages the API communication including streaming responses. It automatically applies appropriate parameter validation and model compatibility checks to ensure optimal results.
Text Output:
JSON Output (simplified):
If you have any questions, please contact me at: [email protected]
Follow me on X (Twitter): https://x.com/lyson_ober
Please refer to the PRIVACY.md file for information about how your data is handled when using this plugin. This plugin does not collect any data directly, but your audio is processed through OpenAI's services subject to their privacy policies.