app icon
xParse
1.2.1

Parse complex documents into Markdown, structured elements, tables, and images for RAG pipelines and agent workflows.

intsig-textin/xparse967 installs

xParse for RAG and Agents

Author: intsig-textin
Version: 1.2.1
Type: tool


Description

Parse complex documents into Markdown, structured elements, tables, and images for RAG pipelines and agent workflows.

xParse is a structured document parsing tool built for RAG pipelines and agent workflows. It parses PDFs, Word, Excel, PowerPoint, images, and other files into model-ready outputs, including Markdown text, structured elements, tables, and images.

Unlike simple document-to-text conversion tools, xParse is designed for workflows that need richer document structure and layout-aware understanding. It helps turn complex files into content that can be used for knowledge ingestion, retrieval, agent reasoning, information extraction, and downstream automation.

Use xParse when your workflow needs more than plain text output — for example, when you need document sections, titles, tables, image blocks, page-level metadata, or structured content elements that can be passed into later nodes in Dify. xParse returns Markdown in the field, structured blocks in , and image resources in , which makes it more suitable for multi-step workflows than a simple Markdown-only parser.

Supports both Free API and Paid API — install and use immediately without any credentials.


Best For

  • RAG document preprocessing
  • Knowledge base ingestion
  • Agent document reading and reasoning
  • Structured information extraction
  • Table and layout-aware parsing
  • Multi-step workflow automation
  • Image-aware document understanding

Quick Start

1. Free API (Default, No Credentials Required)

Simply install the plugin in Dify and start using it — no credentials needed. Leave the and fields empty during provider configuration.

The free API supports PDF and images (JPG/PNG/BMP/TIFF/WebP), with a daily limit of 1,000 pages.

2. Paid API (Optional)

For higher usage or more formats (Word/Excel/PPT/HTML/OFD and 20+ other formats), get credentials from Textin Console and fill in and in the provider configuration.


Provider Credentials

ParameterTypeRequiredDescription
secret-inputNoTextin application ID. Only required for paid API. Leave empty to use the free API.
secret-inputNoTextin secret code. Only required for paid API. Leave empty to use the free API.

Get credentials for paid API: Please login to Textin and go to Workspace → Account Settings → Developer Information to view your and .


Parse Input Parameters

The xParse Parse tool provides parameters to customize document processing and control the level of detail in returned data.

The only required parameter is – the file you wish to process.


Main Parameters

ParameterTypeRequiredDefaultDescription
fileYes-The file to be parsed (supports PDF, WORD, EXCEL, PPT, images, etc.)
stringNo-Password for encrypted PDF files
stringNo-Specify page ranges to parse. Format: for pages 1-2, for multiple ranges

Capabilities Parameters

Control what additional information is included in the response:

ParameterTypeRequiredDefaultDescription
booleanNoWhether to return element hierarchy and relationships (parent_id, children_ids, ref_element_id) for building document structure graph
booleanNoWhether to return fine-grained inline objects (formulas, handwriting, checkboxes, images within text)
booleanNoWhether to return character-level details (coordinates, confidence, candidate characters)
booleanNoWhether to return image data (image_url, mime_type, base64). When enabled, base64 images are automatically uploaded to Dify
booleanNoWhether to return detailed table structure in JSON format (rows, cols, cells with coordinates and content)
booleanNoWhether to return page metadata list (page dimensions, page_image_url, element_ids per page)
booleanNoWhether to return hierarchical title tree (table of contents)
selectNoFormat of tables in markdown. Options: (simple), (supports complex tables with merged cells)

API Limits

LimitFree APIPaid API
Supported formatsPDF, images (JPG/PNG/BMP/TIFF/WebP)20+ formats (PDF, images, Word, Excel, PPT, HTML, OFD, etc.)
Daily usage1,000 pagesPer plan
File size10MB500MB
PDF pages1,000 pages
XLS/XLSX/CSV≤ 2,000 rows × 100 cols per sheet
TXT≤ 100KB
Image dimensions20–20,000 px20–20,000 px

Notes

  • For more details on capabilities and parameters, refer to the Parse Config Documentation.
  • Enable only the capabilities you need to optimize performance and response size.
  • Default values are optimized for common use cases.

API Response Structure

Top-Level Output Variables

The tool returns structured data with the following output variables:

VariableTypeDescription
stringThe full document content in Markdown format (from API's field)
array of objectList of structured elements extracted from the document
array of objectList of page metadata (only returned if capability is enabled)
array of objectHierarchical title tree / table of contents (only returned if capability is enabled)
array of objectList of images uploaded to Dify (only returned if is enabled and images are present)

Field Details

text

  • Type: string
  • Description:
    The entire document content formatted in Markdown. This comes directly from the API's field and includes proper formatting for headings, paragraphs, tables, images, etc.

elements

  • Type: array of objects
  • Description:
    List of structured elements extracted from the document. Each element represents a semantic unit (title, paragraph, table, image, etc.) with metadata.

Each element object contains:

FieldTypeDescription
stringUnique identifier for the element
stringElement type: , , , , , , , , , , , , ,
stringOptional sub-type for further classification (e.g., for Image: , , , )
stringText content of the element
integerPage number where the element appears (starting from 1)
array8-element array representing normalized quadrilateral coordinates [x1,y1,x2,y2,x3,y3,x4,y4] in range [0,1]
objectElement metadata (see below)
arrayInline objects within the element (only if is enabled)
objectTable structure details (only for Table elements if is enabled)
arrayCharacter-level details (only if is enabled)
objectImage data (only for Image elements if is enabled)
Element metadata

The field provides contextual information:

FieldTypeDescription
stringParent element ID (if is enabled)
arrayChild element IDs (if is enabled)
integerNesting depth for elements of the same type (e.g., 0 for H1, 1 for H2)
stringReferenced element ID, e.g., linking image to its caption (if is enabled)
booleanWhether this element continues from a previous page
stringElement ID that this continues from (if is true)
booleanWhether the element contains inline objects
arrayTypes of inline objects present (e.g., )
integerImage width in pixels (for Image elements)
integerImage height in pixels (for Image elements)
objectData source information including protocol, path, and URLs

pages

  • Type: array of objects
  • Description:
    List of page metadata (only returned if capability is enabled). Each page object contains:
FieldTypeDescription
integerPage number (starting from 1)
numberPage width in pixels
numberPage height in pixels
stringURL of the rendered page image
arrayList of element IDs on this page in reading order
integerDPI used for rendering
numberPage rotation angle (0 is normal reading orientation, clockwise)
stringProcessing status of the page

title_tree

  • Type: array of objects
  • Description:
    Hierarchical document outline (only returned if capability is enabled). Each node contains:
FieldTypeDescription
stringElement ID of the corresponding Title element
stringTitle text
integerTitle level (1 is highest, i.e., H1)
integerPage number where the title appears
arrayNested child title nodes

images

  • Type: array of objects
  • Description:
    List of images uploaded to Dify's file system (only returned if is enabled and images with base64 data are present). Each image object contains:
FieldTypeDescription
stringDify file ID
stringImage file name
stringMIME type of the image
stringURL for image preview
integerImage file size in bytes
stringAlways

Example Response

JSON Structure


Typical Workflow Use Cases

  1. Knowledge ingestion for RAG — Upload a PDF or Office file → parse into Markdown and structured elements → chunk and index into your knowledge base.
  2. Agent document understanding — Let your agent read contracts, reports, manuals, and forms through structured outputs instead of raw files.
  3. Structured information extraction — Parse documents first, then pass clean text blocks, tables, and metadata into downstream extraction, summarization, or decision nodes.
  4. Layout-aware processing — Use titles, page coordinates, tables, and image blocks to support more accurate retrieval, routing, and document automation.

Usage

  1. Install this plugin in Dify
  2. Configure Provider — leave credentials empty for free API, or fill in for paid API
  3. Use the Parse tool in Workflow or Agent applications
  4. Upload a file and configure parsing parameters
  5. Use the returned , , and in downstream nodes

API Reference


Notes

  • The field is suitable for direct display or LLM input.
  • The field is useful for structured processing, chunking, highlighting, and further analysis.
  • The field provides image resources for preview or multimodal workflows.
  • The and fields offer document structure insights.
  • When is enabled, images with base64 data are automatically uploaded to Dify's file system, and the array contains the uploaded file information.
  • Coordinates are normalized to [0, 1] range relative to page dimensions. To convert to pixels, multiply by page width/height.

Tags: RAG, Agent, Document Parsing, Structured Extraction, Knowledge Ingestion, PDF Parsing, Markdown, Tables, Layout Parsing

CATEGORY
Tool
VERSION
1.2.1
intsig-textin·04/30/2026 02:01 AM
REQUIREMENTS
Tool invocation
App invocation
Endpoint registration
Maximum memory
256MB
Maximum storage
1MB