Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(document-readers): add mbox implements ( License cannot be added to an mbox file.) #376

Merged
merged 3 commits into from
Jan 18, 2025

Conversation

brianxiadong
Copy link
Contributor

Describe what this PR does / why we need it / 描述此 PR 的作用及必要性

This PR adds a new document reader implementation for Mbox format emails, which enables Spring AI to process email data from Mbox files. This is useful for scenarios where we need to analyze or process email content in batch.

此 PR 添加了一个新的 Mbox 格式邮件文档读取器实现,使 Spring AI 能够处理 Mbox 文件中的邮件数据。这对于需要批量分析或处理邮件内容的场景非常有用。

Does this pull request fix one issue? / 此 PR 是否解决某个问题?

NONE

Describe how you did it / 描述实现方式

  • Implemented MboxDocumentReader using Apache Commons IO for file reading

  • Used JSoup for HTML content parsing

  • Added support for different email formats (plain text, HTML, multipart)

  • Implemented robust error handling with runtime exceptions

  • Added comprehensive test cases with sample mbox files

  • Created bilingual documentation

  • 使用 Apache Commons IO 实现 MboxDocumentReader 的文件读取

  • 使用 JSoup 进行 HTML 内容解析

  • 添加对不同邮件格式的支持(纯文本、HTML、多部分)

  • 实现使用运行时异常的健壮错误处理

  • 使用示例 mbox 文件添加全面的测试用例

  • 创建中英文双语文档

Describe how to verify it / 描述验证方式

  1. Run the test cases:

    mvn test -Dtest=MboxDocumentReaderTest
  2. Check the test coverage report

  3. Verify the sample code in README.md works as expected

  4. 运行测试用例:

    mvn test -Dtest=MboxDocumentReaderTest
  5. 检查测试覆盖率报告

  6. 验证 README.md 中的示例代码按预期工作

Special notes for reviews / 审阅特别说明

  • The implementation follows Spring AI's document reader pattern

  • Error handling uses runtime exceptions as per project convention

  • HTML parsing is done using JSoup to ensure clean text extraction

  • The reader ignores email attachments by design

  • Documentation is provided in both English and Chinese

  • 实现遵循 Spring AI 的文档读取器模式

  • 按照项目惯例使用运行时异常进行错误处理

  • 使用 JSoup 进行 HTML 解析以确保干净的文本提取

  • 读取器设计上忽略邮件附件

  • 文档提供英文和中文双语版本

close #292

@brianxiadong brianxiadong changed the title feat(document-readers): add mbox implements feat(document-readers): add mbox implements ( License cannot be added to an mbox file.) Jan 17, 2025
@chickenlj chickenlj merged commit c8691c4 into alibaba:main Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add document mail platform mbox
2 participants