GMI Cloud

Overview

GMI Cloud is a cloud-based GPU infrastructure platform that provides high-performance AI model inference services. With an OpenAI-compatible API, GMI Cloud delivers fast and reliable access to popular large language models including DeepSeek, Llama, Qwen, and Zhipu models.

Key Features

OpenAI-Compatible API: Seamless integration with standard OpenAI client libraries and tools.
Multiple Model Families: Access to DeepSeek, Meta Llama, Qwen, OpenAI OSS, and Zhipu (ZAI) models.
High Performance: Optimized GPU infrastructure for fast inference and low latency.
Streaming Support: Real-time streaming responses for chat completions.
Tool Calling: Built-in support for function calling and tool use.
Custom Model Support: Deploy and use your own fine-tuned models.
Flexible Endpoints: Support for custom API endpoints for enterprise deployments.

Configure

After installing the plugin, you will need the following to configure GMI Cloud:

Get your API Key: Sign in to the GMI Cloud console and create an API key.
Configure in Dify: Open Settings → Model Provider, find GMI Cloud, and enter your API key in the field.
Custom Endpoint (Optional): If your organization uses a custom endpoint, fill in the field. Otherwise, the plugin defaults to .
Save: Click "Save" to activate the plugin. Dify will validate your credentials by calling the endpoint.

Built-in Models

The plugin ships with the following preset models you can use immediately:

DeepSeek: ,
OpenAI OSS:
Meta Llama:
Qwen: , , ,
Zhipu (ZAI):