Ollama
Ollama is a cross-platform inference framework client (MacOS, Windows, Linux) designed for seamless deployment of large language models (LLMs) such as Llama 2, Mistral, Llava, and more. With its one-click setup, Ollama enables local execution of LLMs, providing enhanced data privacy and security by keeping your data on your own machine.
Dify supports integrating LLM and Text Embedding capabilities of large language models deployed with Ollama.
Visit Ollama download page to download the Ollama client for your system.
After successful launch, Ollama starts an API service on local port 11434, which can be accessed at .
For other models, visit Ollama Models for more details.
Go to the Dify marketplace and search the Ollama to download it.

In , fill in:

Click "Save" to use the model in the application after verifying that there are no errors.
The integration method for Embedding models is similar to LLM, just change the model type to Text Embedding.
For more detail, please check Dify's official document.
Hint: ollama officially does not support rerank models, please try locally deploying tools like vllm, llama.cpp, tei, xinference, etc., and fill in the complete URL ending with "rerank". Deployment reference llama.cpp deployment tutorial for Qwen3-Reranker
In , fill in:

Click "Add" to use the model in the application after verifying that there are no errors.