TongYi AIGC
A powerful Dify plugin providing comprehensive AI-powered image and video generation capabilities using Alibaba Cloud Tongyi's latest Wanxiang, Qwen, and Z-Image models. Supports text-to-image, text-to-video, image-to-image, image-to-video, image translation, and more with professional-grade quality and flexible configuration options.
Version Information
- Current Version: v0.0.3
- Release Date: 2026-04-26
- Compatibility: Dify Plugin Framework
- Python Version: 3.12
Version History
- v0.0.3 (2026-04-26):
- Added HappyHorse series models for video generation
- Added HappyHorse Text to Video
- Added HappyHorse Image to Video - First Frame
- Added HappyHorse Reference Video
- Added HappyHorse Video Editing
- v0.0.2 (2026-04-12):
- Added new Video Continuation tool (wan_video_continue) - Continue video from existing clips using wan2.7-i2v
- Added wan2.7-t2v model support for Text to Video
- Added wan2.7-i2v model support for Image to Video with multi-modal inputs (first-frame, first+last frame, video continuation)
- Added wan2.7-r2v model support for Reference Video with enhanced features
- Added wan2.7-image-pro/wan2.7-image models for Image to Image with sequential grouped output mode
- Added qwen-image2.0 series models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-edit-max, qwen-image-edit-plus
- Enhanced resolution control with new parameters for wan2.7 models
- Added tags for plugin marketplace (productivity, image, videos)
- v0.0.1 (2026-02-16): Initial release with image and video generation capabilities
Quick Start
- Install the plugin in your Dify environment
- Configure your Alibaba Cloud API credentials (API Key from DashScope)
- Start generating images and videos with AI
Key Features
- Multiple Generation Modes: Text-to-image, text-to-video, image-to-image, image-to-video, image translation
- Latest AI Models: Supports wan2.7, qwen-image2.0, Z-image, wan2.6, wan2.5, wan2.2 for images; wan2.7-t2v, wan2.6-t2v, wan2.5-t2v-preview, wan2.2-t2v-plus for videos
- Flexible Image Sizes: Multiple aspect ratios from 1:1 to 21:9 with various resolutions
- Video Generation: Create videos with customizable duration (2-15 seconds) and synchronized audio
- Image Translation: Translate text in images with AI-powered OCR and translation (14+ languages)
- First-Last Frame Video: Create videos from first and last frame images
- Reference Video: Generate videos based on reference video style
- Video Continuation: Continue existing clips with wan2.7-i2v
- Batch Generation: Generate multiple images in a single request (up to 6 images)
- Watermark Control: Optional watermark for content authenticity
Core Features
Image Generation
Wan Text to Image (wan_text_2_image)
Generate images from text descriptions using Wanxiang models.
- Supported Models: wan2.7-image-pro, wan2.7-image, wan2.6-t2i
- Features:
- Multiple aspect ratios (1:1, 3:2, 2:3, 16:9, 9:16)
- High resolution output (up to 4K for wan2.7-image-pro)
- Optional watermark
- Sequential grouped image output mode (wan2.7)
- Thinking mode for enhanced reasoning (wan2.7)
- Prompt intelligent rewriting
- Batch generation (1-4 images; up to 12 in wan2.7 sequential mode)
- Negative prompt support
Wan Image to Image (wan_image_2_image)
Generate images from text and reference images using Wanxiang models.
- Supported Models: wan2.7-image-pro, wan2.7-image, wan2.6-image (compatibility)
- Features:
- Reference image guided generation (1-9 images)
- Multiple aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9, 21:9)
- 1K/2K presets for wan2.7 models
- Optional watermark
- Sequential grouped image output mode (wan2.7)
- Interleaved text and image output mode (wan2.6 compatibility)
- Prompt intelligent rewriting
- Batch generation (1-4 images; up to 12 in wan2.7 sequential mode)
Qwen Text to Image (qwen_text_2_image)
Generate images using Qwen image models.
- Supported Models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0-2026-03-03, qwen-image-max, qwen-image-plus, qwen-image
- Features:
- High quality image generation
- Multiple aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16)
- Optional watermark
- Prompt intelligent rewriting
- Negative prompt support
- Random seed for reproducibility
Qwen Image to Image (qwen_image_2_image)
Generate images from text and reference images using Qwen models.
- Supported Models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-edit-max, qwen-image-edit-max-2026-01-16, qwen-image-edit-plus, qwen-image-edit, qwen-image-edit-plus-2025-10-30
- Features:
- Reference image guided generation (1-3 images)
- Multiple aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9, 21:9)
- Optional watermark
- Prompt intelligent rewriting
- Batch generation (1-6 images)
- High resolution output (up to 2048*872)
Z-Image Text to Image (z_image_text_2_image)
Generate images using Z-Image Turbo model.
- Supported Models: z-image-turbo
- Features:
- Fast image generation
- Extensive aspect ratio support (1:1, 2:3, 3:2, 3:4, 4:3, 7:9, 9:7, 9:16, 9:21, 16:9, 21:9)
- Multiple resolution options (1024 to 2048 pixels)
- Prompt intelligent rewriting
- Random seed for reproducibility
Image Translation (qwen_image_translate)
Translate text in images with AI-powered OCR and translation.
- Supported Models: qwen-mt-image
- Features:
- Automatic text detection
- Multi-language translation (14 languages)
- Supported languages: Chinese, English, Japanese, Korean, French, German, Spanish, Russian, Italian, Portuguese, Arabic, Thai, Vietnamese, Indonesian
- Domain hint for improved translation
- Sensitive words filtering
- Terminology support
Video Generation
Text to Video (wan_text_2_video)
Generate videos from text descriptions using Wanxiang models.
- Supported Models: wan2.7-t2v, wan2.6-t2v, wan2.5-t2v-preview, wan2.2-t2v-plus, wanx2.1-t2v-turbo, wanx2.1-t2v-plus
- Features:
- Duration: 2-15 seconds (model dependent)
- Resolution control: (wan2.7) or (wan2.6 and earlier)
- Synchronized audio generation
- Prompt intelligent rewriting
- Single/Multi shot support (shot_type for wan2.6, prompt-driven for wan2.7)
- Custom audio URL support
- Negative prompt support
Image to Video (wan_first_image_2_video)
Generate video from a single image with text description.
- Supported Models: wan2.7-i2v, wan2.6-i2v, wan2.5-i2v-preview, wan2.2-i2v-flash, wan2.2-i2v-plus, wanx2.1-i2v-turbo, wanx2.1-i2v-plus
- Features:
- Multi-modal inputs on wan2.7-i2v: first-frame, first+last frame, or video continuation
- Single image input as first frame (legacy models)
- Duration: 2-15 seconds (model dependent)
- Resolution: 480P, 720P, 1080P
- Synchronized audio generation
- Video effect templates
- Prompt intelligent rewriting
- Single/Multi shot support
First-Last Frame Video (wan_first_end_image_2_video)
Generate video from first and last frame images.
- Supported Models: wan2.2-kf2v-flash
- Features:
- First and last frame input
- Smooth transition generation
- Resolution: 480P, 720P, 1080P
- Video effect templates
- Prompt intelligent rewriting
Reference Video (wan_reference_video)
Generate videos based on reference video style.
- Supported Models: wan2.7-r2v, wan2.6-r2v, wan2.6-r2v-flash
- Features:
- Multi-modal references (images/videos) input (up to 5)
- Optional first-frame and reference voice support (wan2.7-r2v)
- Duration: 2-10 seconds (or up to 15s for wan2.7 without reference video)
- Resolution: 720P or 1080P (multiple aspect ratios)
- Synchronized audio generation (wan2.6-r2v-flash only)
- Single/Multi shot support
- Prompt intelligent rewriting
Video Continuation (wan_video_continue)
Continue a video from an existing first clip.
- Supported Models: wan2.7-i2v
- Features:
- Supports and
- Duration 2-15 seconds and 720P/1080P resolution
- Prompt intelligent rewriting and watermark control
Video Query (wan_video_query)
Query the status and results of video generation tasks.
- Features:
- Real-time task status
- Video download URL retrieval
- Optional automatic video download
Image Translation Query (qwen_image_translate_query)
Query image translation task status and results.
- Features:
- Real-time task status
- Translated image retrieval
Technical Advantages
- Latest AI Models: Access to Alibaba Cloud's newest Wanxiang, Qwen, and Z-Image models
- High Quality Output: Professional-grade image and video generation
- Flexible Configuration: Extensive parameter options for fine-tuning
- Async Processing: Efficient video generation with task-based workflow
- Multi-Format Support: Support for various image and video formats
- Audio Generation: Automatic synchronized audio for videos
- Batch Processing: Generate multiple images efficiently
- Image Translation: AI-powered text translation in images (14+ languages)
Requirements
- Python 3.12
- Dify Platform access
- Alibaba Cloud API credentials (API Key from DashScope)
- Required Python packages (installed via requirements.txt):
- dify_plugin>=0.2.0
- requests>=2.31.0,<3.0.0
- pillow>=10.0.0,<11.0.0
Installation & Configuration
-
Install the required dependencies:
-
Configure your Alibaba Cloud API credentials in the plugin settings:
- API Key: Your Alibaba Cloud DashScope API key
-
Install the plugin in your Dify environment
Usage
Image Generation Tools
1. Wan Text to Image
Generate images from text descriptions.
- Parameters:
- : Model version (default: wan2.7-image-pro)
- : Text description of the image (required, wan2.7 <=5000 chars, wan2.6 <=2100 chars)
- : Describe what you don't want (<=500 chars, wan2.6 compatibility)
- : Image size (default: 2K, supports 1K/2K/4K or specific resolutions like 1024*1024)
- : Number of images to generate (1-4, up to 12 with wan2.7 sequential mode)
- : Enable grouped image generation for wan2.7 (default: false)
- : Enable thinking mode for enhanced reasoning on wan2.7 (default: true)
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
- : Random seed for reproducibility
2. Wan Image to Image
Generate images from text and reference images.
- Parameters:
- : Model version (default: wan2.7-image-pro)
- : Text description (required, <=5000 chars)
- : Reference image files (1-9 images, required)
- : Image size (default: 2K)
- : Number of images to generate (1-4, up to 12 when on wan2.7)
- : Enable grouped image generation for wan2.7 (default: false)
- : Enable interleaved output (default: false)
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
3. Qwen Text to Image
Generate images using Qwen models.
- Parameters:
- : Text description (required)
- : Model version (default: qwen-image-2.0-pro)
- : Image size (default: 1664*928)
- : Describe what you don't want
- : Enable prompt intelligent rewriting
- : Enable/disable watermark (default: true)
- : Random seed for reproducibility
4. Qwen Image to Image
Generate images from text and reference images using Qwen models.
- Parameters:
- : Text description (required)
- : Reference image files (1-3 images, required)
- : Model version (default: qwen-image-2.0-pro)
- : Image size (default: 1024*1024)
- : Number of images to generate (1-6, default: 3)
- : Enable prompt intelligent rewriting
- : Enable/disable watermark (default: true)
5. Z-Image Text to Image
Generate images using Z-Image Turbo.
- Parameters:
- : Text description (required, <=800 chars)
- : Model version (default: z-image-turbo)
- : Image size (default: 1024*1536)
- : Enable prompt intelligent rewriting (default: false)
- : Random seed for reproducibility
6. Image Translation
Translate text in images.
- Parameters:
- : Image URL for translation (required)
- : Target language (required)
- : Source language (default: auto)
- : Domain hint for improved translation
- : Sensitive words (JSON array)
- : Terminologies (JSON array)
- : Model version (default: qwen-mt-image)
Video Generation Tools
7. Text to Video
Generate videos from text descriptions.
- Parameters:
- : Model version (default: wan2.6-t2v)
- : Text description (required)
- : Describe what you don't want
- : Video resolution (default: 1920*1080)
- : Duration in seconds (model dependent, default: 5)
- : Custom audio file URL
- : Auto generate audio (default: true)
- : Single or multi shot (default: single)
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
- : Random seed for reproducibility
8. Image to Video
Generate video from an image or continue from a video clip.
- Parameters:
- : Model version (default: wan2.6-i2v, supports wan2.7-i2v)
- : Text description
- / : First frame image input
- / : Optional last frame image (wan2.7-i2v)
- : Optional first clip video URL for continuation (wan2.7-i2v)
- : Video resolution (default: 1080P)
- : Duration in seconds (model dependent, default: 5)
- : Video effect template
- : Custom audio file URL (wan2.7-i2v only in first-frame modes)
- : Auto generate audio (default: true)
- : Single or multi shot (default: single)
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
9. First-Last Frame Video
Generate video from first and last frame images.
- Parameters:
- : Model version (default: wan2.2-kf2v-flash)
- : Text description
- : First frame image (required)
- : Last frame image
- : Video resolution (default: 720P)
- : Video effect template
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
10. Reference Video
Generate videos based on reference video style.
- Parameters:
- : Model version (default: wan2.7-r2v)
- : Text description (required, wan2.7 max 5000 / wan2.6 max 1500)
- : Reference URLs (videos or images, semicolon-separated, max 5)
- : Optional first frame image URL (wan2.7-r2v)
- : Optional reference voice URL (wan2.7-r2v)
- : Legacy resolution in (wan2.6; also mapped for wan2.7 compatibility)
- / : Native wan2.7 resolution tier and aspect ratio
- : Duration in seconds (wan2.6: 2-10; wan2.7: 2-10 or up to 15 without reference video)
- : Single or multi shot (wan2.6)
- : Auto generate audio (default: true, wan2.6-r2v-flash only)
- : Enable prompt intelligent rewriting (default: true)
- : Enable/disable watermark (default: true)
11. Video Query
Query video generation task status.
- Parameters:
- : Video generation task ID (required)
- : Download video when available (default: false)
12. Image Translation Query
Query image translation task status.
- Parameters:
- : Translation task ID (required)
Notes
- Video generation is asynchronous; use Video Query to check status and retrieve results
- Image translation is asynchronous; use Image Translation Query to check status
- Duration limits vary by model (see model documentation in tool descriptions)
- Reference images should be under 10MB in size
- Watermark is enabled by default for content authenticity
- Prompt intelligent rewriting is enabled by default for better results
Developer Information
- Author:
- Email: [email protected]
- License: Apache License 2.0
- Source Code:
- Support: Through Dify platform and GitHub Issues
License Notice
This project is licensed under Apache License 2.0. See LICENSE [blocked] file for full license text.
Ready to create stunning images and videos with AI?