xParse is a modern AI infrastructure for document processing, built for RAG and agentic workflows on LLMs.
Author: intsig-textin
Version: 0.0.1
Type: tool
xParse Document Parsing Tool extracts structured content from various file formats (PDF, WORD, EXCEL, PPT, images, etc.) and converts them into queryable and analyzable structured elements.
When configuring the plugin in Dify, you need to provide the following credentials:
Get your credentials: Please log in to Textin and go to Workspace → Account Settings → Developer Information to view your and .
| Parameter | Type | Required | Description |
|---|---|---|---|
| secret-input | Yes | Textin application ID. Please log in to Textin and go to "Workspace → Account Settings → Developer Information" to view x-ti-app-id. See API Documentation for details. | |
| secret-input | Yes | Textin secret code. Please log in to Textin and go to "Workspace → Account Settings → Developer Information" to view x-ti-secret-code. See API Documentation for details. |
The xParse Parse tool provides parameters to customize the processing of documents.
The only required parameter is – the file you wish to process.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file | Yes | - | The file to be parsed (supports PDF, WORD, EXCEL, PPT, images, etc.) | |
| select | No | The document parsing provider/engine to use. Options: (Recommended), , , | ||
| string | No | - | Password for encrypted PDF files | |
| string | No | - | Specify page ranges to parse. Format: for page 15, for pages 20-25, for pages 1, 3, 5, 6, 7 | |
| select | No | Whether to perform crop and dewarp preprocessing. Options: (No), (Yes) | ||
| select | No | Whether to remove watermark preprocessing. Options: (No), (Yes) | ||
| boolean | No | Whether to return page images (for PDF and other formats that need to be converted to images) | ||
| boolean | No | Whether to return sub-images within pages |
The following parameters only apply when is set to :
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| select | No | PDF parsing mode. Options: (extract text directly from PDF), (treat PDF as images). Note: Images always use scan mode | ||
| select | No | Control underline recognition range (only for scan mode). Options: (No recognition), (Only recognize underlines without text) | ||
| select | No | Whether to enable chart recognition. Recognized charts will be output as tables. Options: (No), (Yes) |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| string | No | - | JSON format S3 storage configuration for storing page images. Includes: , , , , , , |
The tool returns structured data with the following fields:
| Field | Type | Description |
|---|---|---|
| string | The full parsed content in Markdown format, including images, sections, etc. | |
| array of object | List of structured content blocks (sections, paragraphs, images, tables, etc.) | |
| array of object | List of image objects extracted from the content (if or is enabled) |
Each element object contains:
| Field | Type | Description |
|---|---|---|
| string | Unique identifier for the element (SHA-256 hash of text + coordinates + page number + filename) | |
| string | The type of element (e.g., , , , , ) | |
| string | The text content of the element | |
| object | Metadata for the element (see below for details) |
The field provides detailed information about the element's origin, layout, and context.
Common fields include:
| Field | Type | Description |
|---|---|---|
| string | Name of the source file (e.g., ) | |
| string | MIME type or file type (e.g., ) | |
| string | Timestamp of last file modification | |
| integer | Page number in the source file (if applicable) | |
| integer | Width of the page in pixels | |
| integer | Height of the page in pixels | |
| array | 8-element array representing quadrilateral coordinates (normalized, range [0, 1]) | |
| string | ID of the parent element | |
| integer | Depth in the document hierarchy | |
| string | Base64 encoded image data (if is enabled) | |
| string | MIME type for images (e.g., ) | |
| string | URL for page image (if is enabled) | |
| string | URL for original page image (if preprocessing is enabled) | |
| string | Preview URL for images (after uploading to Dify) | |
| string | Unique file ID for images in Dify | |
| string | HTML representation for tables or rich text elements | |
| object | Data source information including record locator, URLs, version, dates |
| Field | Type | Description |
|---|---|---|
| string | Unique image ID (Dify file ID) | |
| string | Image file name | |
| string | MIME type of the image | |
| string | URL for image preview | |
| integer | Image file size in bytes | |
| string | Always |