Ollama

Overview

Ollama runs open models locally on macOS, Windows, and Linux, and also exposes the same API for Ollama cloud models. This Dify plugin integrates Ollama chat/completion models, vision-capable models, text embeddings, tool calling, streamed responses, and thinking output.

Use the Ollama host root as the Base URL. For a local server, use or another reachable host root. For Ollama cloud models, use and provide an Ollama API key. Do not include in the Base URL because the plugin appends , , and .

Capabilities

Capability	Support	Notes
Streaming	Yes	Ollama streams chat and generate responses as newline-delimited JSON. Dify receives incremental content, thinking output, and streamed tool calls.
Thinking	Yes	Enable for boolean thinking models. Use with , , or for GPT-OSS models.
Vision	Yes	Enable for multimodal models such as or other Ollama vision models. Images are sent as base64 through Ollama's array.
Embeddings	Yes	Use model type . The plugin calls and sends batched input with .
Tool calling	Yes	Enable for tool-capable chat models. The plugin supports single, parallel, multi-turn, and streamed tool calls through Dify agents.
Web search	External API	Ollama provides cloud and APIs, but Dify model-provider plugins cannot package tool providers. Use a separate Dify tool plugin or external workflow when an agent needs web search.
Rerank	Compatible endpoint	Ollama does not provide a native rerank endpoint. Use an OpenAI-compatible rerank service and configure its full rerank URL.

Configure Ollama Models

1. Install Ollama

Download Ollama from ollama.com/download.

2. Run a Model

Pull or run the model you want to use:

After startup, the local API is available at .

3. Install the Plugin

In Dify, go to the Marketplace and install the Ollama plugin.

4. Add an LLM Model

Go to and add a model.

Model Name: for example , , , or
Base URL: for local Ollama, or for Ollama cloud
API Key: optional for local deployments; required for Ollama cloud
Model Type: for tool calling, vision, and multi-turn chat
Model Context Length: match the model context window
Upper bound for max tokens: the maximum value Dify should allow
Vision support: choose only for vision-capable models
Function call support: choose only for tool-capable models

For Docker deployments, use a host address reachable from the Dify container, such as or a LAN IP address.

5. Use Thinking

In model parameters:

Set to enable or disable Ollama's field for models that accept booleans.
Set to , , or for GPT-OSS models. When set, it overrides .

Thinking output is preserved in Dify responses with so it can be displayed, hidden, or passed back in tool loops.

6. Use Vision

Use a vision model and set to . Dify image inputs are sent to Ollama in the message array. The Ollama REST API expects base64 image data.

7. Use Tool Calling

Use a tool-capable chat model and set to . Dify will pass available tools to Ollama, execute returned tool calls, and send tool results back with Ollama's message field.

Streaming tool calls are supported. The plugin accumulates streamed , , and so the next agent turn can continue the tool loop.

Configure Embeddings

Add a model with Model Type .

Recommended Ollama embedding models include:

The plugin sends arrays of text to , which returns L2-normalized vectors.

Ollama Web Search

Ollama web search and web fetch are cloud APIs. They require an Ollama account and API key, and they are separate from the local model server endpoints.

This plugin is a model provider, so it does not include Ollama web search as bundled Dify tools. To use web search with an Ollama model in a Dify agent, configure a separate search tool or dedicated Ollama web tool plugin, then enable on the Ollama chat model so it can call the external tool.

Configure Rerank

Ollama does not currently provide a native rerank model endpoint. To use rerank in this plugin, deploy a compatible rerank service such as , , , or Xinference and configure the full rerank endpoint URL.

Model Name: for example
Base URL: either an Ollama-style host root or a full rerank endpoint URL
Model Type:
Model Context Length: match the rerank model

If the URL does not end with , the plugin appends .