app icon
DB Schema RAG
0.1.6

Automatically create database schema knowledge base to build RAGs. Nodes containing natural language to SQL functionality based on schemaRAG

joto/schemarag8455 installs

SchemaRAG Database Schema RAG Plugin


Author: joto
Version: 0.1.6
Type: tool
Repository: https://github.com/JOTO-AI/SchemaRAG-dify-plugin

中文文档


Overview

SchemaRAG is a database schema RAG plugin designed specifically for the Dify platform. It can automatically analyze database structures, build knowledge bases, and implement natural language to SQL queries. This plugin provides a complete database schema analysis and intelligent query solution, ready to use out of the box.

Example workflow download


✨ Core Features

  • Multi-Database Support: MySQL, PostgreSQL, MSSQL, Oracle, DM (达梦), automatic syntax adaptation
  • Schema Auto-Analysis: One-click data dictionary generation, structure visualization
  • Knowledge Base Upload: Automatic upload to Dify, supports incremental updates
  • Natural Language to SQL: Ready to use out of the box, supports complex queries
  • AI Data Analysis: Analyze query data, supports custom rules
  • Data Visualization: Provides visualization tools, LLM recommends charts and fields
  • Security Mechanism: SELECT-only access, supports field whitelist, minimum privilege principle
  • Flexible Support: Compatible with mainstream large language models

📋 Configuration Parameters

Parameter NameTypeRequiredDescriptionExample
Dataset API KeysecretYesDify knowledge base API keydataset-xxx
Database TypeselectYesDatabase type MySQL/PostgreSQL/MSSQL/Oracle/DMMySQL
Database HoststringYesDatabase host/IP127.0.0.1
Database PortnumberYesDatabase port3306/5432
Database UserstringYesDatabase usernameroot
Database PasswordsecretYesDatabase password******
Database NamestringYesDatabase namemydb
Dify Base URLstringNoDify API base URL

Supported Database Types

Database TypeDefault PortDriverConnection String Format
MySQL3306pymysql
PostgreSQL5432psycopg2-binary
Microsoft SQL Server1433pymssql
Oracle1521oracledb
DM Database (达梦)5236dm+pymysql

🚀 Quick Start

Method 1: Command Line

Method 2: Dify Plugin Integration

  1. Fill in the above parameters in the Dify platform plugin configuration interface

  2. After configuration is complete and accurate, click save to automatically build the configured database schema knowledge base in Dify

  3. Add tools in the workflow and configure the knowledge base ID that was just created (the knowledge base ID is in the URL of the knowledge base page)

  4. Provide SQL execution tool, input the generated SQL for direct execution, supports markdown and json output

Method 3: Code Invocation


🛠️ Tool Components

1. text2sql Tool

Natural Language to SQL Query Tool - Convert natural language questions to SQL queries using database schema knowledge base

Core Features

  • Intelligent Query Conversion: Automatically convert natural language questions to accurate SQL query statements
  • Multi-Database Support: Supports MySQL, PostgreSQL, MSSQL, Oracle, and DM SQL dialects
  • Knowledge Base Retrieval: Intelligent retrieval and matching based on database schema knowledge base
  • Ready to Use: Can be used directly after configuring the knowledge base, no additional setup required
  • Customize propt rules: Add custom to prompt words and configure custom rules

Parameter Configuration

ParameterTypeRequiredDescription
dataset_idstringYesDify knowledge base ID containing database schema
llmmodel-selectorYesLarge language model for SQL generation
contentstringYesNatural language question to convert to SQL
dialectselectYesSQL dialect (MySQL/PostgreSQL/MSSQL/Oracle/DM)
top_knumberNoNumber of results to retrieve from knowledge base (default 5)

2. sql_executer Tool

SQL Query Execution Tool - Safely execute SQL queries and return formatted results

Core Features

  • Safe Execution: Only supports SELECT queries to ensure data security
  • Output Control: Provides interface to control maximum query rows to prevent excessive data queries
  • Multi-Format Output: Supports JSON and Markdown output formats
  • Direct Connection: Direct database connection for query execution, real-time results
  • Error Handling: Comprehensive error handling mechanism with detailed error information

Parameter Configuration

ParameterTypeRequiredDescription
sqlstringYesSQL query statement to execute
output_formatselectYesOutput format (JSON/Markdown)
max_lineintNoMaximum number of query rows (default 1000)

3. sql_executer_cust Tool

Custom SQL Query Execution Tool - Custom database connection and safely execute SQL queries to return formatted results

Core Features

  • Custom Database Connection: Supports multiple databases without plugin configuration
  • Safe Execution: Only supports SELECT queries to ensure data security
  • Output Control: Provides interface to control maximum query rows to prevent excessive data queries
  • Multi-Format Output: Supports JSON and Markdown output formats
  • Direct Connection: Direct database connection for query execution, real-time results
  • Error Handling: Comprehensive error handling mechanism with detailed error information

Parameter Configuration

ParameterTypeRequiredDescription
database_urlstringYesDatabase connection URL
sqlstringYesSQL query statement to execute
output_formatselectYesOutput format (JSON/Markdown)
max_lineintNoMaximum number of query rows (default 1000)

Database connection URL examples:

  • mysql: mysql://user:password@host:port/dbname
  • postgresql: postgresql://user:password@host:port/dbname
  • DM: dameng://user:password@host:port/dbname
  • mssql: mssql://user:password@host:port/dbname
  • oracle: oracle://user:password@host:port/dbname

4. text2data Tool (recommend)

Natural Language to Data Query Tool - Integrates text2sql and sql_executer functionality for one-stop conversion from questions to data

Core Features

  • End-to-End Query: Convert natural language questions directly to query results without intermediate steps
  • Multi-Database Support: Supports MySQL, PostgreSQL, MSSQL, Oracle, and DM databases
  • Smart Output: Supports JSON, Markdown, and Summary output formats
  • SQL Auto-Repair: Experimental feature that automatically analyzes and fixes SQL errors when execution fails (requires enablement)
  • Safe Execution: Built-in SQL security policies to prevent dangerous operations
  • Optimized Experience: Uses tags to fold intermediate processes, with clear result display

Parameter Configuration

ParameterTypeRequiredDescription
dataset_idstringYesDify knowledge base ID containing database schema, supports multiple IDs separated by commas
llmmodel-selectorYesLarge language model for SQL generation and analysis
contentstringYesNatural language question to convert to SQL
dialectselectYesSQL dialect (MySQL/PostgreSQL/MSSQL/Oracle/DM)
output_formatselectYesOutput format (JSON/Markdown/Summary)
top_knumberNoNumber of results to retrieve from knowledge base (default 5)
max_rowsnumberNoMaximum number of rows to return (default 500, prevents excessive data)
example_dataset_idstringNoExample knowledge base ID, can provide SQL examples to improve generation quality
enable_refinerbooleanNoEnable SQL auto-repair feature (experimental, default false)
max_refine_iterationsnumberNoMaximum SQL repair attempts (1-5, default 3)

SQL Auto-Repair Feature (Experimental)

When is enabled, if the generated SQL execution fails, the system will:

  1. Auto-Analyze Errors: Capture database error messages and specific causes
  2. Intelligent Repair: Use LLM to analyze errors and generate repaired SQL
  3. Iterative Optimization: Support up to N repair attempts (configurable)
  4. Transparent Process: Display repair process within tags

Repair Scenario Examples:

  • ✅ Column name spelling errors (e.g., )
  • ✅ Table name does not exist or is incorrect
  • ✅ JOIN condition errors
  • ✅ Data type mismatches
  • ✅ Syntax errors (dialect-specific syntax)

Usage Recommendations:

  • 🧪 Experimental feature,Enabling it will increase the consumption of tokens additionally.
  • 📝 Better results in complex Schema scenarios
  • ⚡ Adds 2-10 seconds to response time
  • 💰 Each repair consumes approximately 2000-3000 tokens

5. data_summary Tool

Data Summary Analysis Tool - Intelligent data content analysis and summarization using large language models

Analysis Capabilities

  • Custom Rules: Supports user-defined analysis rules and guidelines
  • Smart Data Format Recognition: Automatically identifies JSON and other data formats for optimized processing
  • Performance Optimized: Cached common configurations to reduce response time

Configuration Options

ParameterTypeRequiredDescription
data_contentstringYesData content to be analyzed
llmmodel-selectorYesLarge language model for analysis
querystringYesAnalysis query or focus area
custom_rulesstringNoCustom analysis rules
user_promptstringNoCustom prompt

6. llm_chart_generator Tool

LLM Intelligent Chart Generation Module - Based on large language models to recommend chart types and fields, using antv to render charts, providing highly maintainable end-to-end chart solutions

Features

  • Intelligent Analysis: Automatically analyzes user questions and data, intelligently selects the most suitable chart type
  • Multi-Chart Support: Supports mainstream charts such as bar charts, line charts, pie charts, scatter plots, histograms
  • High Maintainability: Modular design with clear interfaces, easy to extend and maintain
  • Unified Standards: Chart configuration uses standardized JSON format for easy integration and parsing
  • Fallback Solutions: Automatically falls back to table display when chart generation fails
  • Configuration Validation: Comprehensive configuration validation and error handling mechanisms to ensure stability

Configuration Options

ParameterTypeRequiredDescription
user_questionstringYesUser question describing the chart type and requirements (e.g., sales trends, market share)
datastringYesData for visualization, supports JSON, CSV, or structured data
llmmodel-selectorYesLarge language model for analysis and chart generation
sql_querystringYesSQL query statement used to recommend charts and fields

❓ FAQ

Q: Which databases are supported?
A: Currently supports MySQL, PostgreSQL, MSSQL, Oracle, and DM (达梦).

Q: Is the data secure?
A: The plugin only reads database structure information to build Dify knowledge base. Sensitive information is not uploaded.

Q: How to configure the database?
A: Configure database and knowledge base related information in the Dify plugin page. After configuration, it will automatically build the schema knowledge base in Dify.

Q: How to use the text2sql tool?
A: After configuring the database and generating the schema knowledge base, you need to obtain the dataset_id from the generated knowledge base URL and fill it into the tool to specify the indexed knowledge base, and configure other information to use it.

Q: What data formats does the data_summary tool support?
A: Supports multiple data formats including text and JSON. The tool automatically recognizes and optimizes processing. Supports data content up to 50,000 characters.

Q: How to use custom rules?
A: You can specify specific analysis requirements, focus points, or constraints in the custom_rules parameter, supporting up to 2,000 characters.


📸 Example Screenshots


📞 Contact


📄 License

Apache-2.0 license

CATEGORY
Tool
VERSION
0.1.6
joto·02/26/2026 03:23 PM
REQUIREMENTS
LLM invocation
Tool invocation
Maximum memory
256MB
Maximum storage
1MB