app icon
Lemonade
0.0.2

Lemonade lets you run local models with NPU and GPU acceleration.

langgenius/lemonade453 installs

Overview

Lemonade is a client inference framework (Windows, Linux) designed for seamless deployment of large language models (LLMs) with NPU and GPU acceleration. It supports models like Qwen, Llama, DeepSeek, and more, optimized for different hardware configurations.

Lemonade enables local execution of LLMs, providing enhanced data privacy and security by keeping your data on your own machine while leveraging hardware acceleration for improved performance.

Dify integrates with Lemonade Server to provide LLM, text embedding, and reranking capabilities for models deployed locally.

Configure

1. Install and Run Lemonade Server

Visit the Lemonade's Website to download the Lemonade Server client for your system.

To start Lemonade server, simply run:

Note: If you installer from source use instead

Once started, Lemonade will be accessible at .

2. Install Lemonade Plugin in Dify

Go to the Dify marketplace, search for "Lemonade", and click to install the official plugin.

If this is your first time running Dify, you can find out how to get started here.

3. Integrate Lemonade Server in Dify

Go to .

Then, fill in the following configuration:

Basic Configuration:

  • Model Type: Choose from , , or based on your use case
  • Model Name: Your selected model. You can see available models here.
  • API Endpoint URL: Base URL where the Lemonade Server
    • For most cases this should be
    • If Dify is deployed using Docker, consider using the local network IP address, e.g., or
  • Authorization Name: Leave this field blank. Lemonade uses built-in authentication and does not require an API key.
  • Model Context Size: The maximum context size of the model (default: 4096).
  • Agent Thought Support: Select "Support" if your model supports reasoning chains
  • Vision Support: Select "Support" if your model supports image understanding.

Sample Configuration:

4. Done!

You can now use Lemonade with your favorite Dify workflow!

Beyond the Basics

Additional Models

You can manage which models are installed on Lemonade using the Model Management GUI.

  • Open your web browser and navigate to
  • Click on the "Model Management" tab
  • Browse available models and install them with one click

If you are using an AMD RyzenAI 300 series processor, you are able to use NPU and Hybrid (NPU+iGPU) acceleration. This include models like and many others.

For a complete list of supported models, visit Lemonade Server Models.

Lemonade Advanced Options

Lemonade contains a series of advanced options, including server-level context size configurations, and Llama.cpp ROCm support. For aditional details, please check Lemonade Server Documentation and the Lemonade Server GitHub Repository.

CATEGORY
Model
VERSION
0.0.2
langgenius·09/18/2025 12:55 AM
REQUIREMENTS
LLM invocation
Tool invocation
Maximum memory
1MB