app icon
TongYi AIGC
0.0.3

TongYi AIGC: Supports full-featured image and video generation using the latest Alibaba Tongyi models (HappyHorse, wan2.7, qwen-image2.0, Z-image, wan2.6,wan2.5, wan2.2, and wanx2.1).

sawyer-shi/tongyi_aigc5965 installs

TongYi AIGC

A powerful Dify plugin providing comprehensive AI-powered image and video generation capabilities using Alibaba Cloud Tongyi's latest Wanxiang, Qwen, and Z-Image models. Supports text-to-image, text-to-video, image-to-image, image-to-video, image translation, and more with professional-grade quality and flexible configuration options.

Version Information

  • Current Version: v0.0.3
  • Release Date: 2026-04-26
  • Compatibility: Dify Plugin Framework
  • Python Version: 3.12

Version History

  • v0.0.3 (2026-04-26):
    • Added HappyHorse series models for video generation
    • Added HappyHorse Text to Video
    • Added HappyHorse Image to Video - First Frame
    • Added HappyHorse Reference Video
    • Added HappyHorse Video Editing
  • v0.0.2 (2026-04-12):
    • Added new Video Continuation tool (wan_video_continue) - Continue video from existing clips using wan2.7-i2v
    • Added wan2.7-t2v model support for Text to Video
    • Added wan2.7-i2v model support for Image to Video with multi-modal inputs (first-frame, first+last frame, video continuation)
    • Added wan2.7-r2v model support for Reference Video with enhanced features
    • Added wan2.7-image-pro/wan2.7-image models for Image to Image with sequential grouped output mode
    • Added qwen-image2.0 series models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-edit-max, qwen-image-edit-plus
    • Enhanced resolution control with new parameters for wan2.7 models
    • Added tags for plugin marketplace (productivity, image, videos)
  • v0.0.1 (2026-02-16): Initial release with image and video generation capabilities

Quick Start

  1. Install the plugin in your Dify environment
  2. Configure your Alibaba Cloud API credentials (API Key from DashScope)
  3. Start generating images and videos with AI

Key Features

  • Multiple Generation Modes: Text-to-image, text-to-video, image-to-image, image-to-video, image translation
  • Latest AI Models: Supports wan2.7, qwen-image2.0, Z-image, wan2.6, wan2.5, wan2.2 for images; wan2.7-t2v, wan2.6-t2v, wan2.5-t2v-preview, wan2.2-t2v-plus for videos
  • Flexible Image Sizes: Multiple aspect ratios from 1:1 to 21:9 with various resolutions
  • Video Generation: Create videos with customizable duration (2-15 seconds) and synchronized audio
  • Image Translation: Translate text in images with AI-powered OCR and translation (14+ languages)
  • First-Last Frame Video: Create videos from first and last frame images
  • Reference Video: Generate videos based on reference video style
  • Video Continuation: Continue existing clips with wan2.7-i2v
  • Batch Generation: Generate multiple images in a single request (up to 6 images)
  • Watermark Control: Optional watermark for content authenticity

Core Features

Image Generation

Wan Text to Image (wan_text_2_image)

Generate images from text descriptions using Wanxiang models.

  • Supported Models: wan2.7-image-pro, wan2.7-image, wan2.6-t2i
  • Features:
    • Multiple aspect ratios (1:1, 3:2, 2:3, 16:9, 9:16)
    • High resolution output (up to 4K for wan2.7-image-pro)
    • Optional watermark
    • Sequential grouped image output mode (wan2.7)
    • Thinking mode for enhanced reasoning (wan2.7)
    • Prompt intelligent rewriting
    • Batch generation (1-4 images; up to 12 in wan2.7 sequential mode)
    • Negative prompt support

Wan Image to Image (wan_image_2_image)

Generate images from text and reference images using Wanxiang models.

  • Supported Models: wan2.7-image-pro, wan2.7-image, wan2.6-image (compatibility)
  • Features:
    • Reference image guided generation (1-9 images)
    • Multiple aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9, 21:9)
    • 1K/2K presets for wan2.7 models
    • Optional watermark
    • Sequential grouped image output mode (wan2.7)
    • Interleaved text and image output mode (wan2.6 compatibility)
    • Prompt intelligent rewriting
    • Batch generation (1-4 images; up to 12 in wan2.7 sequential mode)

Qwen Text to Image (qwen_text_2_image)

Generate images using Qwen image models.

  • Supported Models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0-2026-03-03, qwen-image-max, qwen-image-plus, qwen-image
  • Features:
    • High quality image generation
    • Multiple aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16)
    • Optional watermark
    • Prompt intelligent rewriting
    • Negative prompt support
    • Random seed for reproducibility

Qwen Image to Image (qwen_image_2_image)

Generate images from text and reference images using Qwen models.

  • Supported Models: qwen-image-2.0-pro, qwen-image-2.0, qwen-image-edit-max, qwen-image-edit-max-2026-01-16, qwen-image-edit-plus, qwen-image-edit, qwen-image-edit-plus-2025-10-30
  • Features:
    • Reference image guided generation (1-3 images)
    • Multiple aspect ratios (1:1, 2:3, 3:2, 3:4, 4:3, 9:16, 16:9, 21:9)
    • Optional watermark
    • Prompt intelligent rewriting
    • Batch generation (1-6 images)
    • High resolution output (up to 2048*872)

Z-Image Text to Image (z_image_text_2_image)

Generate images using Z-Image Turbo model.

  • Supported Models: z-image-turbo
  • Features:
    • Fast image generation
    • Extensive aspect ratio support (1:1, 2:3, 3:2, 3:4, 4:3, 7:9, 9:7, 9:16, 9:21, 16:9, 21:9)
    • Multiple resolution options (1024 to 2048 pixels)
    • Prompt intelligent rewriting
    • Random seed for reproducibility

Image Translation (qwen_image_translate)

Translate text in images with AI-powered OCR and translation.

  • Supported Models: qwen-mt-image
  • Features:
    • Automatic text detection
    • Multi-language translation (14 languages)
    • Supported languages: Chinese, English, Japanese, Korean, French, German, Spanish, Russian, Italian, Portuguese, Arabic, Thai, Vietnamese, Indonesian
    • Domain hint for improved translation
    • Sensitive words filtering
    • Terminology support

Video Generation

Text to Video (wan_text_2_video)

Generate videos from text descriptions using Wanxiang models.

  • Supported Models: wan2.7-t2v, wan2.6-t2v, wan2.5-t2v-preview, wan2.2-t2v-plus, wanx2.1-t2v-turbo, wanx2.1-t2v-plus
  • Features:
    • Duration: 2-15 seconds (model dependent)
    • Resolution control: (wan2.7) or (wan2.6 and earlier)
    • Synchronized audio generation
    • Prompt intelligent rewriting
    • Single/Multi shot support (shot_type for wan2.6, prompt-driven for wan2.7)
    • Custom audio URL support
    • Negative prompt support

Image to Video (wan_first_image_2_video)

Generate video from a single image with text description.

  • Supported Models: wan2.7-i2v, wan2.6-i2v, wan2.5-i2v-preview, wan2.2-i2v-flash, wan2.2-i2v-plus, wanx2.1-i2v-turbo, wanx2.1-i2v-plus
  • Features:
    • Multi-modal inputs on wan2.7-i2v: first-frame, first+last frame, or video continuation
    • Single image input as first frame (legacy models)
    • Duration: 2-15 seconds (model dependent)
    • Resolution: 480P, 720P, 1080P
    • Synchronized audio generation
    • Video effect templates
    • Prompt intelligent rewriting
    • Single/Multi shot support

First-Last Frame Video (wan_first_end_image_2_video)

Generate video from first and last frame images.

  • Supported Models: wan2.2-kf2v-flash
  • Features:
    • First and last frame input
    • Smooth transition generation
    • Resolution: 480P, 720P, 1080P
    • Video effect templates
    • Prompt intelligent rewriting

Reference Video (wan_reference_video)

Generate videos based on reference video style.

  • Supported Models: wan2.7-r2v, wan2.6-r2v, wan2.6-r2v-flash
  • Features:
    • Multi-modal references (images/videos) input (up to 5)
    • Optional first-frame and reference voice support (wan2.7-r2v)
    • Duration: 2-10 seconds (or up to 15s for wan2.7 without reference video)
    • Resolution: 720P or 1080P (multiple aspect ratios)
    • Synchronized audio generation (wan2.6-r2v-flash only)
    • Single/Multi shot support
    • Prompt intelligent rewriting

Video Continuation (wan_video_continue)

Continue a video from an existing first clip.

  • Supported Models: wan2.7-i2v
  • Features:
    • Supports and
    • Duration 2-15 seconds and 720P/1080P resolution
    • Prompt intelligent rewriting and watermark control

Video Query (wan_video_query)

Query the status and results of video generation tasks.

  • Features:
    • Real-time task status
    • Video download URL retrieval
    • Optional automatic video download

Image Translation Query (qwen_image_translate_query)

Query image translation task status and results.

  • Features:
    • Real-time task status
    • Translated image retrieval

Technical Advantages

  • Latest AI Models: Access to Alibaba Cloud's newest Wanxiang, Qwen, and Z-Image models
  • High Quality Output: Professional-grade image and video generation
  • Flexible Configuration: Extensive parameter options for fine-tuning
  • Async Processing: Efficient video generation with task-based workflow
  • Multi-Format Support: Support for various image and video formats
  • Audio Generation: Automatic synchronized audio for videos
  • Batch Processing: Generate multiple images efficiently
  • Image Translation: AI-powered text translation in images (14+ languages)

Requirements

  • Python 3.12
  • Dify Platform access
  • Alibaba Cloud API credentials (API Key from DashScope)
  • Required Python packages (installed via requirements.txt):
    • dify_plugin>=0.2.0
    • requests>=2.31.0,<3.0.0
    • pillow>=10.0.0,<11.0.0

Installation & Configuration

  1. Install the required dependencies:

  2. Configure your Alibaba Cloud API credentials in the plugin settings:

    • API Key: Your Alibaba Cloud DashScope API key
  3. Install the plugin in your Dify environment

Usage

Image Generation Tools

1. Wan Text to Image

Generate images from text descriptions.

  • Parameters:
    • : Model version (default: wan2.7-image-pro)
    • : Text description of the image (required, wan2.7 <=5000 chars, wan2.6 <=2100 chars)
    • : Describe what you don't want (<=500 chars, wan2.6 compatibility)
    • : Image size (default: 2K, supports 1K/2K/4K or specific resolutions like 1024*1024)
    • : Number of images to generate (1-4, up to 12 with wan2.7 sequential mode)
    • : Enable grouped image generation for wan2.7 (default: false)
    • : Enable thinking mode for enhanced reasoning on wan2.7 (default: true)
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)
    • : Random seed for reproducibility

2. Wan Image to Image

Generate images from text and reference images.

  • Parameters:
    • : Model version (default: wan2.7-image-pro)
    • : Text description (required, <=5000 chars)
    • : Reference image files (1-9 images, required)
    • : Image size (default: 2K)
    • : Number of images to generate (1-4, up to 12 when on wan2.7)
    • : Enable grouped image generation for wan2.7 (default: false)
    • : Enable interleaved output (default: false)
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)

3. Qwen Text to Image

Generate images using Qwen models.

  • Parameters:
    • : Text description (required)
    • : Model version (default: qwen-image-2.0-pro)
    • : Image size (default: 1664*928)
    • : Describe what you don't want
    • : Enable prompt intelligent rewriting
    • : Enable/disable watermark (default: true)
    • : Random seed for reproducibility

4. Qwen Image to Image

Generate images from text and reference images using Qwen models.

  • Parameters:
    • : Text description (required)
    • : Reference image files (1-3 images, required)
    • : Model version (default: qwen-image-2.0-pro)
    • : Image size (default: 1024*1024)
    • : Number of images to generate (1-6, default: 3)
    • : Enable prompt intelligent rewriting
    • : Enable/disable watermark (default: true)

5. Z-Image Text to Image

Generate images using Z-Image Turbo.

  • Parameters:
    • : Text description (required, <=800 chars)
    • : Model version (default: z-image-turbo)
    • : Image size (default: 1024*1536)
    • : Enable prompt intelligent rewriting (default: false)
    • : Random seed for reproducibility

6. Image Translation

Translate text in images.

  • Parameters:
    • : Image URL for translation (required)
    • : Target language (required)
    • : Source language (default: auto)
    • : Domain hint for improved translation
    • : Sensitive words (JSON array)
    • : Terminologies (JSON array)
    • : Model version (default: qwen-mt-image)

Video Generation Tools

7. Text to Video

Generate videos from text descriptions.

  • Parameters:
    • : Model version (default: wan2.6-t2v)
    • : Text description (required)
    • : Describe what you don't want
    • : Video resolution (default: 1920*1080)
    • : Duration in seconds (model dependent, default: 5)
    • : Custom audio file URL
    • : Auto generate audio (default: true)
    • : Single or multi shot (default: single)
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)
    • : Random seed for reproducibility

8. Image to Video

Generate video from an image or continue from a video clip.

  • Parameters:
    • : Model version (default: wan2.6-i2v, supports wan2.7-i2v)
    • : Text description
    • / : First frame image input
    • / : Optional last frame image (wan2.7-i2v)
    • : Optional first clip video URL for continuation (wan2.7-i2v)
    • : Video resolution (default: 1080P)
    • : Duration in seconds (model dependent, default: 5)
    • : Video effect template
    • : Custom audio file URL (wan2.7-i2v only in first-frame modes)
    • : Auto generate audio (default: true)
    • : Single or multi shot (default: single)
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)

9. First-Last Frame Video

Generate video from first and last frame images.

  • Parameters:
    • : Model version (default: wan2.2-kf2v-flash)
    • : Text description
    • : First frame image (required)
    • : Last frame image
    • : Video resolution (default: 720P)
    • : Video effect template
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)

10. Reference Video

Generate videos based on reference video style.

  • Parameters:
    • : Model version (default: wan2.7-r2v)
    • : Text description (required, wan2.7 max 5000 / wan2.6 max 1500)
    • : Reference URLs (videos or images, semicolon-separated, max 5)
    • : Optional first frame image URL (wan2.7-r2v)
    • : Optional reference voice URL (wan2.7-r2v)
    • : Legacy resolution in (wan2.6; also mapped for wan2.7 compatibility)
    • / : Native wan2.7 resolution tier and aspect ratio
    • : Duration in seconds (wan2.6: 2-10; wan2.7: 2-10 or up to 15 without reference video)
    • : Single or multi shot (wan2.6)
    • : Auto generate audio (default: true, wan2.6-r2v-flash only)
    • : Enable prompt intelligent rewriting (default: true)
    • : Enable/disable watermark (default: true)

11. Video Query

Query video generation task status.

  • Parameters:
    • : Video generation task ID (required)
    • : Download video when available (default: false)

12. Image Translation Query

Query image translation task status.

  • Parameters:
    • : Translation task ID (required)

Notes

  • Video generation is asynchronous; use Video Query to check status and retrieve results
  • Image translation is asynchronous; use Image Translation Query to check status
  • Duration limits vary by model (see model documentation in tool descriptions)
  • Reference images should be under 10MB in size
  • Watermark is enabled by default for content authenticity
  • Prompt intelligent rewriting is enabled by default for better results

Developer Information

  • Author:
  • Email: [email protected]
  • License: Apache License 2.0
  • Source Code:
  • Support: Through Dify platform and GitHub Issues

License Notice

This project is licensed under Apache License 2.0. See LICENSE [blocked] file for full license text.


Ready to create stunning images and videos with AI?

CATEGORY
Tool
TAGS
IMAGEPRODUCTIVITY
VERSION
0.0.3
sawyer-shi·05/08/2026 08:37 AM
REQUIREMENTS
LLM invocation
Tool invocation
App invocation
Endpoint registration
Maximum memory
256MB
Maximum storage
1MB