data_analysis

Version: 2.0.4

Type: tool

Overview

This plugin enables codeless data analysis through natural language interaction. It supports Text2SQL, Text2Data, and Text2Code analysis. Simply upload Excel/CSV files to automatically execute data queries, data interpretation, data cleaning, and data visualization (ChatBI).
New support for multi-sheet queries and cross-sheet analysis, capable of automatically recognizing and parsing structured data in multiple worksheets, improving multi-sheet data processing capabilities.The plugin will intelligently parse time, metrics, and analytical dimensions through conversational queries , then generate SQL queries for data, and create interactive BI charts, structured analysis reports, data cleaning operations, and data quality assessments. Optimized for standardized vertical datasets, powered by enterprise-grade analytics engine for reliable results.

This plugin is supported by ChartGen AI

Configuration

1. Apply for an API Key

You can easily create and manage your API Key in the ChartGen AI - API. To begin with, You need to register for an ChartGen AI account.

Once on the homepage, click the bottom left corner to access the API management dashboard.

[Image blocked: apply_apikey_1.png]

Here, you can create new APIs and set the credit consumption limit for each API. A single account can create up to 10 APIs.

[Image blocked: apply_apikey_2.png]

After successful creation, you can copy the API Key to Dify for verification. You can also view the credit consumption of each API and manage your APIs.

[Image blocked: apply_apikey_3.png]

2. Get data analysis tools from the Marketplace

The tools could be found at the plugin Marketplace, please install it.

3. Service Authorization

Select [Plugins] - [data analysis] in Dify navigation page
Click the "To Authorize" button
Paste your unique API Key to complete verification

[Image blocked: set_apikey.png]

4. Credit Rules:

Calling a single tool consumes 20 credits.
You will get 200 free credits per month for ChartGen AI Free account, with each batch of credits valid for three months.
When credits run out, you can purchase more or upgrade your account on the ChartGen AI Billing page. Each batch of purchased credits is valid for three months. Expiration dates and billing details are available on the website-Billing.

[Image blocked: apply_apikey_4.png]

Workflow Cases

The following are the parameter descriptions and usage scenario examples of each tool.

1. data_connector

Used to connect mainstream databases such as MySQL, PostgreSQL, Starrocks and Doris, allowing users to query database data using natural language. Once data is retrieved, it can be seamlessly integrated with our other tools for analysis, interpretation, and visualization.

The query results support downloading as an .xlsx file for easier local viewing and further processing.

💡 If you want the output to include files, please ensure to add the ' files ' output type in the last component of the flow to get the download link.

Note: For optimal browsing experience, results are limited to 100 rows by default. When working with large datasets, user may retrieve the full dataset by using the intelligently generated SQL query provided by the tool.

Input Parameter	Description	Example
query	Query statement	query Query statement "Search GMV data in 2024.06.30"
database type	Select the corresponding type of database	As shown in the following figure
database typename	Name of the database/schema to connect to	As shown in the following figure
database user	Username for database connection	As shown in the following figure
database password	Name of the database/schema to connect to	As shown in the following figure
database ip	IP address of the database server	As shown in the following figure
database port	Port number for database connection	As shown in the following figure
database name	Name of the database to connect to	As shown in the following figure

Example input: For the database with url="mysql+pymysql://aaaadmin:[email protected]:11110/dify?charset=utf8", fill in the parameters as shown in the following figures.

[Image blocked: data_connector_1.png]

Output Parameter	Description	Example
query results	Output of data_connector(Including SQL statements and returned query results in markdown format.)	As shown in the following figure

[Image blocked: data_connector_2.png]

[Image blocked: data_connector_3.png]

Common Precautions：

When the database contains too many tables, you need to specify the table name in the query (the table name must match exactly with the name in the database).
Pay attention to distinguish between IP and port when entering parameters.
Internal network, local databases and clustered databases are not currently supported; only databases that can be connected through DBeaver can be used.

2. data_analysis

Parameter	Description	Example
query	Query statement	"What were the best-selling products in each month?"
input_data	Table data in Markdown format (e.g. markdown text output by the Doc Extractor for tables)	As shown in the sales table example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: Only one of input_data or file is needed. If both are provided, file takes precedence. File types support both row-metric-column data files and column-metric-row data files.

[Image blocked: data_analysis_1.png]
[Image blocked: data_analysis_4.png]

The query results support downloading as an .docx file for easier local viewing and further processing.

💡 If you want the output to include files, please ensure to add the ' files ' output type in the last component of the flow to get the download link.

3. data_interpretation

Parameter	Description	Example
query	Query statement	"Please provide a simple data interpretation."
input_data	Table data in Markdown format (e.g. markdown text output by the Doc Extractor for tables)	As shown in the sales table example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: Only one of input_data or file is needed. If both are provided, file takes precedence.

[Image blocked: data_interpretation_1.png]
[Image blocked: data_interpretation_4.png]

The query results support downloading as an .docx file for easier local viewing and further processing.

💡 If you want the output to include files, please ensure to add the ' files ' output type in the last component of the flow to get the download link.

4. data_visualization

Parameter	Description	Example
query	Query statement	"Display the total sales of each product in a pie chart."
input_data	Table data in Markdown format (e.g. markdown text output by the Doc Extractor for tables)	As shown in the sales table example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: Only one of input_data or file is needed. If both are provided, file takes precedence. File types support both row-metric-column data files and column-metric-row data files.

[Image blocked: data_visualization_1.png]
[Image blocked: data_visualization_4.png]

The query results support downloading as an .html file for easier local viewing and further processing.

💡 If you want the output to include files, please ensure to add the ' files ' output type in the last component of the flow to get the download link.

5. time_identify

Used to parse the time required for analysis based on the problem description

Parameter	Description	Example
query	Query statement	"Show me the sales data from the last 7 days"

Output Parameters	Description
beginTime	Start time of the time range
endTime	End time of the time range
times	Discrete time points (e.g. Jan 1, 2025 and Jan 20, 2025)
statTime	Time granularity, including: "year", "quarter", "month", "week", "day". For example, if the user asks about "July of this year", the granularity would be "month".

Note: Any time range excludes today and future dates. When the user asks about the last 7 days, the end time of the returned does not include today, and it is calculated backwards 7 days from yesterday.

[Image blocked: time_identify_1.png]

6. merge_to_multisheet

Merge multiple files into a single file with multiple worksheets.

Parameter	Description	Example
files	Data files(xlsx、xls、csv)	example.xlsx

Output Parameter	Description	Example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: The uploaded files must meet the size and quantity requirements of the Dify platform.

[Image blocked: merge_to_multisheet_1.png] [Image blocked: merge_to_multisheet_2.png]

7. data_cleaning

This tool performs data cleaning through natural language interaction. Users can describe data cleaning requirements in natural language, and the system will automatically process the data based on the description. This tool supports Markdown format data input and file upload (xlsx/xls/csv).

Parameter	Description	Example
query	Query statement	"Please fill in all missing values as 0"
input_data	Table data in Markdown format (e.g. markdown text output by the Doc Extractor for tables)	As shown in the sales table example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: Only one of input_data or file is needed. If both are provided, file takes precedence. This tool can handle various data cleaning tasks, such as deleting duplicate items, handling missing values, data type conversion, etc. If no requirements are mentioned, it will be processed according to commonly used data cleaning methods.

8. data_quality_report

This tool generates data quality reports through natural language interaction. Users can describe their data quality requirements in natural language, and the system will automatically analyze and generate comprehensive data quality reports. This tool supports Markdown format data input and file upload (xlsx/xls/csv).

Parameter	Description	Example
query	Query statement	"Please generate a data quality report"
input_data	Table data in Markdown format (e.g. markdown text output by the Doc Extractor for tables)	As shown in the sales table example
file	Data file(xlsx、xls、csv)	example.xlsx

Note: Only one of input_data or file is needed. If both are provided, file takes precedence. This tool analyzes various aspects of data quality, such as missing values, duplicates, data types, statistical summaries, and data distribution. The generated report can help users understand the overall quality of their data.

Consult

Discord