Lemonade

Overview

Lemonade is a client inference framework (Windows, Linux) designed for seamless deployment of large language models (LLMs) with NPU and GPU acceleration. It supports models like Qwen, Llama, DeepSeek, and more, optimized for different hardware configurations.

Lemonade enables local execution of LLMs, providing enhanced data privacy and security by keeping your data on your own machine while leveraging hardware acceleration for improved performance.

Dify integrates with Lemonade Server to provide LLM (including vision and structured output), text embedding, reranking, speech-to-text (Whisper), and text-to-speech (Kokoro) capabilities for models deployed locally.

Configure

1. Install and Run Lemonade Server

Visit the Lemonade's Website to download the Lemonade Server client for your system.

To start Lemonade server, simply run:

Note: If you installer from source use instead

Once started, Lemonade will be accessible at .

2. Install Lemonade Plugin in Dify

Go to the Dify marketplace, search for "Lemonade", and click to install the official plugin.

If this is your first time running Dify, you can find out how to get started here.

3. Integrate Lemonade Server in Dify

Go to .

Then, fill in the following configuration:

Basic Configuration:

Model Type: Choose from , , , , or based on your use case
Model Name: Your selected model. You can see available models here.
API Endpoint URL: Base URL where the Lemonade Server
- For most cases this should be
- If Dify is deployed using Docker, consider using the local network IP address, e.g., or
Authorization Name: Leave this field blank. Lemonade uses built-in authentication and does not require an API key.

LLM options:

Model Context Size: The maximum context size of the model (default: 4096).
Agent Thought Support: Select "Support" if your model supports reasoning chains.
Vision Support: Select "Support" if your model supports image understanding.
Structured Output Support: Select "Support" if your model can return JSON object / JSON schema responses.

Speech-to-Text options (Whisper recipes):

Language: The primary language of the audio (e.g. , ).
Initial Prompt: Optional prompt to bias the transcription.

Text-to-Speech options (Kokoro recipes):

Available Voices: Comma-separated list of voice names; the first is used as the default.
Audio Format: Output format ( or ).
Words per chunk: Maximum words sent per request for long inputs.

Sample Configuration:

4. Done!

You can now use Lemonade with your favorite Dify workflow!

Beyond the Basics

Additional Models

You can manage which models are installed on Lemonade using the Model Management GUI.

Open your web browser and navigate to
Click on the "Model Management" tab
Browse available models and install them with one click

If you are using an AMD RyzenAI 300 series processor, you are able to use NPU and Hybrid (NPU+iGPU) acceleration. This include models like and many others.

For a complete list of supported models, visit Lemonade Server Models.

Lemonade Advanced Options

Lemonade contains a series of advanced options, including server-level context size configurations, and Llama.cpp ROCm support. For aditional details, please check Lemonade Server Documentation and the Lemonade Server GitHub Repository.