xParse for RAG and Agents
Author: intsig-textin
Version: 1.2.1
Type: tool
Description
Parse complex documents into Markdown, structured elements, tables, and images for RAG pipelines and agent workflows.
xParse is a structured document parsing tool built for RAG pipelines and agent workflows. It parses PDFs, Word, Excel, PowerPoint, images, and other files into model-ready outputs, including Markdown text, structured elements, tables, and images.
Unlike simple document-to-text conversion tools, xParse is designed for workflows that need richer document structure and layout-aware understanding. It helps turn complex files into content that can be used for knowledge ingestion, retrieval, agent reasoning, information extraction, and downstream automation.
Use xParse when your workflow needs more than plain text output — for example, when you need document sections, titles, tables, image blocks, page-level metadata, or structured content elements that can be passed into later nodes in Dify. xParse returns Markdown in the field, structured blocks in , and image resources in , which makes it more suitable for multi-step workflows than a simple Markdown-only parser.
Supports both Free API and Paid API — install and use immediately without any credentials.
Best For
- RAG document preprocessing
- Knowledge base ingestion
- Agent document reading and reasoning
- Structured information extraction
- Table and layout-aware parsing
- Multi-step workflow automation
- Image-aware document understanding
Quick Start
1. Free API (Default, No Credentials Required)
Simply install the plugin in Dify and start using it — no credentials needed. Leave the and fields empty during provider configuration.
The free API supports PDF and images (JPG/PNG/BMP/TIFF/WebP), with a daily limit of 1,000 pages.
2. Paid API (Optional)
For higher usage or more formats (Word/Excel/PPT/HTML/OFD and 20+ other formats), get credentials from Textin Console and fill in and in the provider configuration.
Provider Credentials
Get credentials for paid API: Please login to Textin and go to Workspace → Account Settings → Developer Information to view your and .
Parse Input Parameters
The xParse Parse tool provides parameters to customize document processing and control the level of detail in returned data.
The only required parameter is – the file you wish to process.
Main Parameters
Capabilities Parameters
Control what additional information is included in the response:
API Limits
Notes
- For more details on capabilities and parameters, refer to the Parse Config Documentation.
- Enable only the capabilities you need to optimize performance and response size.
- Default values are optimized for common use cases.
API Response Structure
Top-Level Output Variables
The tool returns structured data with the following output variables:
Field Details
text
- Type: string
- Description:
The entire document content formatted in Markdown. This comes directly from the API's field and includes proper formatting for headings, paragraphs, tables, images, etc.
elements
- Type: array of objects
- Description:
List of structured elements extracted from the document. Each element represents a semantic unit (title, paragraph, table, image, etc.) with metadata.
Each element object contains:
Element metadata
The field provides contextual information:
pages
- Type: array of objects
- Description:
List of page metadata (only returned if capability is enabled). Each page object contains:
title_tree
- Type: array of objects
- Description:
Hierarchical document outline (only returned if capability is enabled). Each node contains:
images
- Type: array of objects
- Description:
List of images uploaded to Dify's file system (only returned if is enabled and images with base64 data are present). Each image object contains:
Example Response
JSON Structure
Typical Workflow Use Cases
- Knowledge ingestion for RAG — Upload a PDF or Office file → parse into Markdown and structured elements → chunk and index into your knowledge base.
- Agent document understanding — Let your agent read contracts, reports, manuals, and forms through structured outputs instead of raw files.
- Structured information extraction — Parse documents first, then pass clean text blocks, tables, and metadata into downstream extraction, summarization, or decision nodes.
- Layout-aware processing — Use titles, page coordinates, tables, and image blocks to support more accurate retrieval, routing, and document automation.
Usage
- Install this plugin in Dify
- Configure Provider — leave credentials empty for free API, or fill in for paid API
- Use the Parse tool in Workflow or Agent applications
- Upload a file and configure parsing parameters
- Use the returned , , and in downstream nodes
API Reference
Notes
- The field is suitable for direct display or LLM input.
- The field is useful for structured processing, chunking, highlighting, and further analysis.
- The field provides image resources for preview or multimodal workflows.
- The and fields offer document structure insights.
- When is enabled, images with base64 data are automatically uploaded to Dify's file system, and the array contains the uploaded file information.
- Coordinates are normalized to [0, 1] range relative to page dimensions. To convert to pixels, multiply by page width/height.
Tags: RAG, Agent, Document Parsing, Structured Extraction, Knowledge Ingestion, PDF Parsing, Markdown, Tables, Layout Parsing