Volcengine AI.
Volcengine (火山引擎) is a cloud service platform offering AI model development, deployment, and management tools for data processing and analysis.
For specific steps, please see Account Registration and Real-Name Authentication.
For specific steps, see Obtaining AccessKey and SecretKey.
The Volcengine AI tool could be found at the Plugin Marketplace, please install it first.
On the Dify navigation page, click Tools > Volcengine AI > Authorize to fill in the Access Key ID and the Access Key Secret.

You can use the Volcengine AI tool in the following application types.

Both Chatflow and Workflow applications support adding Volcengine AI tool nodes.

Seedream General 3.0 Text-to-Image Model is a next-generation text-to-image model built on ByteDance's visual large model, with significant improvements in comprehensive capabilities (image-text, structure, aesthetics). The V3.0 model has a larger parameter count, better semantic understanding, and more natural and realistic entity structures, supporting direct output at resolutions below 2048, with substantial enhancements across various scenarios.
Generate a variety of realistic portrait photos based on the facial features of a single real person in the input image, guided by a text prompt. This can be widely applied in interactive entertainment, photo effects, and e-commerce marketing scenarios. Pre-set style words can be used as templates, or new template effects can be adjusted based on examples.
Smart Drawing (Image-to-Image) SeedEdit 3.0 is a diffusion model developed by ByteDance for image editing, capable of modifying given images based on text prompts. It has made significant advancements in accurately following editing instructions and effectively preserving image content (such as character features and fine details), especially excelling in handling real images, achieving an excellent balance across multiple key editing metrics, and producing high-quality editing results that effectively meet user expectations.
Jimeng High AES General V2.1 text-to-image API, accurately interprets text input to generate images that align with the semantics and possess unique aesthetics. Seamlessly connects creative ideas with visual presentation, achieving "what you see is what you think" with one click. This service focuses on supporting rendering of Chinese and English characters.
JimengVGFMT2VL20 is a video generation API that supports text-to-video and image-to-video, deeply understanding text and image instructions to generate videos with excellent visual fluency. It supports fine-tuning of multi-dimensional parameters, providing a professional-level video generation experience for creative expression.
JimengVGFMI2VL20 is a video generation API that supports image-to-video, deeply understanding image instructions to generate videos with excellent visual fluency. It supports fine-tuning of multi-dimensional parameters, providing a professional-level video generation experience for creative expression.
Image-to-Image V3.0 is the image-to-image capability of the same source as Jimeng. In terms of text response accuracy, graphic layout, hierarchical aesthetics, and semantic understanding capabilities, it has significantly improved compared to previous versions. The texture of portraits is more realistic, and it also supports outputting high-definition images. In addition, on the basis of more accurate text responses, it also supports response of large and small fonts, various artistic fonts, and different font weights.
Jimeng T2I V3.0 is the text-to-image capability of the same source as Jimeng. In terms of text response accuracy, graphic layout, hierarchical aesthetics, and semantic understanding capabilities, it has significantly improved compared to previous versions. The texture of portraits is more realistic, and it also supports outputting high-definition images. In addition, on the basis of more accurate text responses, it also supports response of large and small fonts, various artistic fonts, and different font weights.
Jimeng T2I V3.1 is the text-to-image capability of the same source as Jimeng. In terms of text response accuracy, graphic layout, hierarchical aesthetics, and semantic understanding capabilities, it has significantly improved compared to previous versions. The texture of portraits is more realistic, and it also supports outputting high-definition images. In addition, on the basis of more accurate text responses, it also supports response of large and small fonts, various artistic fonts, and different font weights.
Image-to-Image 3.0 - Character Feature Preservation DreamO is based on the self-developed Seedream 3.0 model, integrating proprietary subject preservation capabilities. It can maintain consistency between the output image's subject appearance/facial features/style characteristics and the input "single reference image", while supporting adjustments through text prompts. Subject appearance, facial features, and style characteristics work together with the input text during the generation process to create images that are consistent with the reference image's subject while conforming to the text description, achieving a balance between feature preservation and creative generation.