Lemonade lets you run local models with NPU and GPU acceleration.

Lemonade is a client inference framework (Windows, Linux) designed for seamless deployment of large language models (LLMs) with NPU and GPU acceleration. It supports models like Qwen, Llama, DeepSeek, and more, optimized for different hardware configurations.
Lemonade enables local execution of LLMs, providing enhanced data privacy and security by keeping your data on your own machine while leveraging hardware acceleration for improved performance.
Dify integrates with Lemonade Server to provide LLM, text embedding, and reranking capabilities for models deployed locally.
Visit the Lemonade's Website to download the Lemonade Server client for your system.
To start Lemonade server, simply run:
Note: If you installer from source use instead
Once started, Lemonade will be accessible at .
Go to the Dify marketplace, search for "Lemonade", and click to install the official plugin.

If this is your first time running Dify, you can find out how to get started here.
Go to .

Then, fill in the following configuration:

Basic Configuration:
Sample Configuration:
You can now use Lemonade with your favorite Dify workflow!

You can manage which models are installed on Lemonade using the Model Management GUI.
If you are using an AMD RyzenAI 300 series processor, you are able to use NPU and Hybrid (NPU+iGPU) acceleration. This include models like and many others.
For a complete list of supported models, visit Lemonade Server Models.
Lemonade contains a series of advanced options, including server-level context size configurations, and Llama.cpp ROCm support. For aditional details, please check Lemonade Server Documentation and the Lemonade Server GitHub Repository.