SoMark is a DocAI that can convert diverse documents—such as PDFs, images, and more—into structured Markdown or JSON. It is designed to work seamlessly across all scenarios.
SoMark is a DocAI that can convert diverse documents—such as PDFs, images, and more—into structured Markdown or JSON. It is designed to work seamlessly across all scenarios.
SoMark 是一款 DocAI 产品,能够将各种文档(如 PDF、图片等)转换为结构化的 Markdown 或 JSON 格式。它旨在无缝适用于所有场景。
It breaks the traditional trade-off between accuracy, speed, and cost, delivering precise document parsing in milliseconds with minimal hardware resources.
它打破了传统 OCR 在准确性、速度和成本之间的权衡,以极少的硬件资源在毫秒级内提供精确的文档解析。
The resulting structured data is AI-native, ready to power LLM training, enhance RAG systems, and enable intelligent agents.
生成的结构化数据是 AI 原生的,可直接用于 LLM 训练、增强 RAG 系统和赋能智能体 (Agent)。
SoMark pioneers the proprietary 「OXR」algorithm, extending traditional OCR (Optical Character Recognition) into Optical Everything Recognition.
SoMark 首创了专有的「OXR」算法,将传统的 OCR(光学字符识别)扩展为 OXR(光学万物识别)。
From basic layout segmentation and reading-order recovery to complex elements such as tables, formulas, images, and even chemical notations, every component can be accurately extracted and reconstructed. The output is a complete, highly structured representation of the document.
从基础的版面分割和阅读顺序还原,到复杂的元素(如表格、公式、图片,甚至化学符号),所有组件都能被准确提取和重构。输出的是文档的完整、高度结构化的表示。

Built on this powerful OXR algorithm, SoMark achieves the perfect balance of accuracy, speed, and cost:
基于强大的 OXR 算法,SoMark 实现了准确性、速度和成本的完美平衡:
Accurate / 精准: Ultra-fine granularity— coordinate-traceable parsing for 21 document element types
超细粒度——对 21 种文档元素类型进行坐标可追溯的解析
Fast / 快速: Exceptional performance—parsing 100 pages in as little as 5 seconds
卓越性能——解析 100 页仅需 5 秒
Economical / 经济: Robust efficiency—private deployment can start with just a single RTX 3090
高效能——私有化部署仅需单张 RTX 3090 即可启动
SoMark delivers strong general-purpose recognition capability. A single API call handles document parsing across all formats and scenarios.
SoMark 提供强大的通用识别能力。只需一个 API 调用即可处理所有格式和场景的文档解析。
Finance / 金融: research reports, financial statements, prospectuses
研报、财报、招股书
Research / 科研: academic papers, programming books, patent documents
学术论文、编程书籍、专利文档
Education / 教育: exam papers, workbooks, textbooks, scanned books
试卷、教辅、教材、扫描书籍
Manufacturing / 制造: forms, industrial manuals, engineering drawings
表单、工业手册、工程图纸
Legal / 法律: regulations, contracts, industry standards
法规、合同、行业标准
Others / 其他: white papers, PPTs, handwritten notes, vertical text, magazines, newspapers
白皮书、PPT、手写笔记、竖排文本、杂志、报纸
Image Understanding / 图像理解: Comprehensively understands image content and generates accurate descriptions for pictures within documents.
全面理解图像内容,并为文档中的图片生成准确描述。
Embedded Image Restoration / 嵌入式图像还原: Recovers images embedded within text paragraphs and table cells, precisely presenting the original, complex information.
还原嵌入在文本段落和表格单元格中的图片,精确呈现原始的复杂信息。
Watermark Resistance with Seal Recognition / 抗水印与印章识别: Removes watermark interference, identifies seals/stamps, and extracts clean, pure content.
去除水印干扰,识别印章/图章,提取干净纯粹的内容。
Heading Hierarchy Recognition / 标题层级识别: Recognizes and extracts the hierarchy of headings in a document.
识别文档中的标题层级,提取标题的层级结构。
Cross-Page Table Patching / 跨页表格合并: Merges tables that span multiple pages, preserving the structure of the original document.
合并跨页的表格,保留原始文档的表格结构。
Cross-Page Text Patching / 跨页文字合并: Merges text that spans multiple pages, preserving the original document structure.
合并跨页的文字,保留原始文档的结构。
Log into your Dify platform.
登录您的 Dify 平台。
Go to "Tools" -> "Plugin Market", search for "SoMark" plugin and add it.
前往 "工具" -> "插件市场",搜索 "SoMark" 插件并添加。
Configure the SoMark plugin parameters:
配置 SoMark 插件参数:
Base URL: Default is . You usually don't need to change this.
默认为 。通常无需修改。
API Key: Enter your SoMark API Key.
输入您的 SoMark API Key。
Save your configuration.
保存配置。
In your Dify workflow, click "+" to add a new node, select "Tools", then find and add the SoMark > Extract Document node.
在 Dify 工作流中,点击 "+" 添加新节点,选择 "工具",找到并添加 SoMark > SoMark 文档解析 节点。

In the Extract Document node panel, configure the File input:
在 文档解析 节点面板中,配置 文件 输入:
For more parameters (such as output format and feature toggles), see Input Parameters / 输入参数 below.
更多参数(如输出格式、功能开关等)请参考下方 Input Parameters / 输入参数。
Note / 注意:
The API Key and Base URL are automatically injected from the plugin configuration — you do not need to enter them manually in the node.
In self-hosted deployments, tool nodes run inside the plugin runtime (plugin-daemon). Ensure it can reach the configured Base URL (network egress / proxy / DNS).
API Key 和 Base URL 由插件配置自动注入,无需在节点中手动输入。
私有化部署时,工具节点在插件运行时(plugin-daemon)中执行。请确保该运行环境能够访问你配置的 Base URL(出网 / 代理 / DNS)。

After the node executes, its output variables become available for all downstream nodes (e.g., LLM, Text Splitter, Code node). Click in any downstream node's input field and select from the SoMark node's output variables.
节点执行完成后,其输出变量可在所有下游节点(如 LLM、文本分割、代码节点)中使用。在任意下游节点的输入框中点击 ,即可选择 SoMark 节点的输出变量。
The node exposes the following output variables:
节点暴露以下输出变量:
— The parsed document content in Markdown format, preserving the original layout structure including headings, tables, lists, formulas, and images.
解析后的文档内容(Markdown 格式),保留原始版面结构,包括标题、表格、列表、公式和图片。
— The parsed document content in JSON string format, containing structured data for document
elements such as text blocks, tables, formulas, images, coordinates, and page information. Suitable for advanced
downstream processing in a Code node after JSON parsing.
解析后的文档内容(JSON 字符串格式),包含文本块、表格、公式、图片、坐标位置、页码等结构化信息。适合在代码节点中解析后做高级处理。
/ — Dify built-in variables, not populated by this plugin.
Dify 内置变量,本插件不填充这两个变量。
This plugin interacts with the SoMark API.
本插件基于 SoMark API 构建。