Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(document-readers): add mysql implements #379

Merged
merged 4 commits into from
Jan 18, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions community/document-readers/mysql-document-reader/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# MySQL Document Reader

MySQL Document Reader 是一个基于Spring AI的文档读取器实现,用于从MySQL数据库中读取数据并将其转换为文档格式。

MySQL Document Reader is a Spring AI-based document reader implementation that reads data from MySQL database and converts it into document format.

## 特性 | Features

- 使用纯JDBC实现,无需额外的连接池或ORM框架
- 支持自定义内容列和元数据列
- 完善的错误处理和资源自动关闭
- 支持自定义SQL查询
- 遵循Spring AI的Document接口规范

- Pure JDBC implementation, no additional connection pool or ORM framework required
- Support for custom content and metadata columns
- Comprehensive error handling and automatic resource cleanup
- Support for custom SQL queries
- Compliant with Spring AI Document interface specification

## 依赖要求 | Dependencies

```xml
<dependencies>
<!-- Spring AI Core -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-core</artifactId>
</dependency>

<!-- MySQL JDBC Driver -->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要引入版本吗?我记得 spring ai alibaba 引入 spring boot parent bom,里面是有 mysql jar 包的,可以复用?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

替换成了mysql-connector-j

<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.33</version>
</dependency>
</dependencies>
```

## 使用方法 | Usage

### 1. 创建MySQL资源配置 | Create MySQL Resource Configuration

```java
MySQLResource resource = new MySQLResource(
"localhost", // MySQL主机地址 | MySQL host address
3306, // MySQL端口号 | MySQL port number
"your_db", // 数据库名称 | Database name
"username", // 用户名 | Username
"password", // 密码 | Password
"SELECT * FROM your_table", // SQL查询语句 | SQL query
Arrays.asList("title", "content"), // 文档内容字段 | Document content fields
Arrays.asList("id", "created_at") // 元数据字段 | Metadata fields
);
```

### 2. 创建文档读取器 | Create Document Reader

```java
MySQLDocumentReader reader = new MySQLDocumentReader(resource);
```

### 3. 获取文档 | Get Documents

```java
List<Document> documents = reader.get();
```

### 4. 处理文档 | Process Documents

```java
for (Document doc : documents) {
// 获取文档内容 | Get document content
String content = doc.getContent();

// 获取元数据 | Get metadata
Map<String, Object> metadata = doc.getMetadata();

// 进行后续处理 | Process further
}
```

## 配置说明 | Configuration

### MySQLResource 参数 | Parameters

| 参数 Parameter | 说明 Description | 默认值 Default |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

来个默认值?就是 127.0.0.1:3306 root root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

|------|------|--------|
| host | MySQL服务器地址 MySQL server address | 无 None |
| port | MySQL服务器端口 MySQL server port | 无 None |
| database | 数据库名称 Database name | 无 None |
| username | 用户名 Username | 无 None |
| password | 密码 Password | 无 None |
| query | SQL查询语句 SQL query | 无 None |
| contentColumns | 文档内容字段列表 Document content field list | null (使用所有字段 use all fields) |
| metadataColumns | 元数据字段列表 Metadata field list | null (不使用元数据 no metadata) |

## 注意事项 | Notes

1. 确保MySQL JDBC驱动已正确配置
Ensure MySQL JDBC driver is properly configured

2. 建议使用合适的WHERE条件和LIMIT子句限制查询结果集大小
Recommend using appropriate WHERE conditions and LIMIT clauses to restrict result set size

3. 对于大型结果集,注意内存使用情况
For large result sets, be mindful of memory usage

4. 敏感信息(如密码)建议使用配置文件或环境变量管理
Sensitive information (like passwords) should be managed using configuration files or environment variables

5. 建议在生产环境中使用连接池管理数据库连接
Recommend using connection pools for database connections in production environment

## 示例 | Examples

### 基本查询示例 | Basic Query Example

```java
MySQLResource resource = new MySQLResource(
"localhost",
3306,
"test_db",
"test_user",
"test_password",
"SELECT id, title, content FROM articles WHERE status = 'published' LIMIT 100",
Arrays.asList("title", "content"), // 内容字段 | Content fields
Arrays.asList("id") // 元数据字段 | Metadata fields
);

MySQLDocumentReader reader = new MySQLDocumentReader(resource);
List<Document> documents = reader.get();
```

### 自定义查询示例 | Custom Query Example

```java
MySQLResource resource = new MySQLResource(
"localhost",
3306,
"blog_db",
"blog_user",
"blog_password",
"""
SELECT
p.id,
p.title,
p.content,
u.username as author,
p.created_at
FROM posts p
JOIN users u ON p.author_id = u.id
WHERE p.status = 'published'
AND p.created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
""",
Arrays.asList("title", "content", "author"), // 内容字段 | Content fields
Arrays.asList("id", "created_at") // 元数据字段 | Metadata fields
);
```

## 许可证 | License

[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
65 changes: 65 additions & 0 deletions community/document-readers/mysql-document-reader/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2024-2025 the original author or authors.
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ https://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba</artifactId>
<version>${revision}</version>
<relativePath>../../../pom.xml</relativePath>
</parent>

<artifactId>mysql-document-reader</artifactId>
<name>Spring AI Alibaba MySQL Document Reader</name>
<description>Spring AI Alibaba MySQL Document Reader Implementation</description>

<dependencies>
<!-- Spring AI Core -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-core</artifactId>
</dependency>

<!-- MySQL JDBC Driver -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.33</version>
</dependency>

<!-- Test Dependencies -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

</project>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以空行结尾

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
/*
* Copyright 2024-2025 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.alibaba.cloud.ai.reader.mysql;

import org.springframework.ai.document.Document;
import org.springframework.ai.document.DocumentReader;

import java.sql.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用全量引入,用增量引入

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

import java.util.*;

/**
* MySQL document reader implementation Uses JDBC to connect and fetch data from MySQL
*
* @author brianxiadong
**/
public class MySQLDocumentReader implements DocumentReader {

private final MySQLResource mysqlResource;

public MySQLDocumentReader(MySQLResource mysqlResource) {
this.mysqlResource = mysqlResource;
}

@Override
public List<Document> get() {
List<Document> documents = new ArrayList<>();
try {
// Register MySQL JDBC driver
Class.forName("com.mysql.cj.jdbc.Driver");

// Create database connection
try (Connection connection = createConnection()) {
documents = executeQueryAndProcessResults(connection);
}
}
catch (ClassNotFoundException e) {
throw new RuntimeException("MySQL JDBC driver not found", e);
}
catch (SQLException e) {
throw new RuntimeException("Error executing MySQL query: " + e.getMessage(), e);
}
return documents;
}

/**
* Create database connection
*/
private Connection createConnection() throws SQLException {
return DriverManager.getConnection(mysqlResource.getJdbcUrl(), mysqlResource.getUsername(),
mysqlResource.getPassword());
}

/**
* Execute query and process results
*/
private List<Document> executeQueryAndProcessResults(Connection connection) throws SQLException {
List<Document> documents = new ArrayList<>();
try (Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(mysqlResource.getQuery())) {

List<String> columnNames = getColumnNames(resultSet.getMetaData());
while (resultSet.next()) {
Map<String, Object> rowData = extractRowData(resultSet, columnNames);
String content = buildContent(rowData);
Map<String, Object> metadata = buildMetadata(rowData);
documents.add(new Document(content, metadata));
}
}
return documents;
}

/**
* Get list of column names
*/
private List<String> getColumnNames(ResultSetMetaData metaData) throws SQLException {
List<String> columnNames = new ArrayList<>();
int columnCount = metaData.getColumnCount();
for (int i = 1; i <= columnCount; i++) {
columnNames.add(metaData.getColumnName(i));
}
return columnNames;
}

/**
* Extract row data
*/
private Map<String, Object> extractRowData(ResultSet resultSet, List<String> columnNames) throws SQLException {
Map<String, Object> rowData = new HashMap<>();
for (int i = 0; i < columnNames.size(); i++) {
String columnName = columnNames.get(i);
Object value = resultSet.getObject(i + 1);
rowData.put(columnName, value);
}
return rowData;
}

/**
* Build document content
*/
private String buildContent(Map<String, Object> rowData) {
StringBuilder contentBuilder = new StringBuilder();
List<String> contentColumns = mysqlResource.getContentColumns();

if (contentColumns == null || contentColumns.isEmpty()) {
// If no content columns specified, use all columns
for (Map.Entry<String, Object> entry : rowData.entrySet()) {
appendColumnContent(contentBuilder, entry.getKey(), entry.getValue());
}
}
else {
// Only use specified content columns
for (String column : contentColumns) {
if (rowData.containsKey(column)) {
appendColumnContent(contentBuilder, column, rowData.get(column));
}
}
}
return contentBuilder.toString().trim();
}

/**
* Append column content
*/
private void appendColumnContent(StringBuilder builder, String column, Object value) {
builder.append(column).append(": ").append(value).append("\n");
}

/**
* Build metadata
*/
private Map<String, Object> buildMetadata(Map<String, Object> rowData) {
Map<String, Object> metadata = new HashMap<>();
metadata.put(MySQLResource.SOURCE, mysqlResource.getJdbcUrl());

List<String> metadataColumns = mysqlResource.getMetadataColumns();
if (metadataColumns != null) {
for (String column : metadataColumns) {
if (rowData.containsKey(column)) {
metadata.put(column, rowData.get(column));
}
}
}
return metadata;
}

}
Loading
Loading