MinerU is a tool that converts FILES into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models.https://github.com/opendatalab/MinerU
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.
MinerU is a document parser that can parse complex document data for any downstream LLM use case (RAG, agents)
GitHub - opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.
Support the official API of MinerU

Local Deploy corresponds to MinerU release 2.5
The supported input file types have been increased to include PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, and JPEG.
Remove the "Replace Markdown Image Path" tool. Now, the image paths in the Markdown will be automatically replaced with previewable URLs (the validity period of the URL is determined by the FILES_ACCESS_TIMEOUT in dify.env).If you want to use this feature, please update the Dify's core code.

Supports more export formats (HTML, DOC, LaTeX). The download links for the additional formats will be stored in the of the output variables.

You can download the YAML file and import it into Dify. This demo includes a basic capability demonstration of the plugin.
The version 0.0.2 can now support the official API of MinerU.
Log into your Dify platform.
Go to "Tools" -> "Plugin Market", search for "MinerU" plugin and add it.
Configure the MinerU plugin parameters:
Save your configuration.
| parameter | type | required | example | description |
|---|---|---|---|---|
| enable formula recognition | bool | false | true | Whether to enable formula recognition, the default is true |
| enable table recognition | bool | false | true | Whether to enable table recognition, the default is true |
| document language | string | false | ch | Specify the document language, the default ch, can be set to auto, when it is auto, the model will automatically recognize the document language, see the list of other optional values for details:DownloadPaddleOCR |
| enable ocr recognition | bool | false | true | Whether to start the ocr function, the default is false |
| extra export formats | [string] | false | ["docx","html"] | Markdown and json are the default export formats without setting. This parameter only supports one or more of the three formats of docx, html, and latex. |
| model version | string | false | vlm | MinerU model version; options: pipeline or vlm |

The plugin provides five types of output for each processed file:
text : The parsed Markdown text
files: The extra export formats files(html,docx,latex)
json: The parsed content list
full_zip_url: Only for Official API, the zip URL of the complete parsed result
images: The images extracted from the PDF

Version 0.3.1 of the plugin corresponds to MinerU release 2.1.1.
Get your local IP address:
For Dify to correctly access the MinerU API, you need to use your LAN IP address (Do NOT use or ). Get your IP address based on your operating system:
Note your IP address, for example:
Deploy the MinerU Web API project:
Follow the instructions here:
MinerU/projects/web_api/README.md at magic_pdf-1.2.2-released · opendatalab/MinerU · GitHub
Log into your Dify platform.
Go to "Tools" -> "Plugin Market", search for "MinerU" plugin and add it.
Configure the MinerU plugin parameters:
Note: Ensure that the Dify service can access this base URL.

Save your configuration.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file | file | Yes | - | File to be parsed |
| parse_method | select | Yes | auto | Parsing method, can be auto, ocr, or txt. |
**Note: Other parameters are invalid for the local deployment version. **
Same as the Official API output variables (see above).
To ensure the MinerU plugin can properly handle file uploads, you need to configure the setting in Dify:
Find your Dify deployment directory and edit the file.
Modify the configuration based on your deployment method:
Confirm that the Dify API service's port is exposed externally (check port mapping in the file).
After saving the file, restart the Dify service for the configuration to take effect:
Tips:
If you use your local IP for the , your IP address may change when your network environment changes (e.g., connecting to a different WiFi). When this happens, you'll need to:
- Get your new local IP address
- Update the MinerU plugin's Base URL configuration in Dify
- If necessary (if Dify's is configured with an IP address rather than ), update Dify's file and restart the Dify service
When using the Dify MinerU plugin, especially when processing file uploads, if you don't configure this step, you may encounter errors like . This usually occurs because Dify's service cannot correctly access its own file service.

Please follow the instructions above to configure the settings accordingly, and this issue will be resolved.
https://github.com/langgenius/dify/issues/16327
This plugin is powered by [MinerU](GitHub - opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.)