feat(document-readers): add mbox implements ( License cannot be added to an mbox file.) #376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe what this PR does / why we need it / 描述此 PR 的作用及必要性
This PR adds a new document reader implementation for Mbox format emails, which enables Spring AI to process email data from Mbox files. This is useful for scenarios where we need to analyze or process email content in batch.
此 PR 添加了一个新的 Mbox 格式邮件文档读取器实现,使 Spring AI 能够处理 Mbox 文件中的邮件数据。这对于需要批量分析或处理邮件内容的场景非常有用。
Does this pull request fix one issue? / 此 PR 是否解决某个问题?
NONE
Describe how you did it / 描述实现方式
Implemented MboxDocumentReader using Apache Commons IO for file reading
Used JSoup for HTML content parsing
Added support for different email formats (plain text, HTML, multipart)
Implemented robust error handling with runtime exceptions
Added comprehensive test cases with sample mbox files
Created bilingual documentation
使用 Apache Commons IO 实现 MboxDocumentReader 的文件读取
使用 JSoup 进行 HTML 内容解析
添加对不同邮件格式的支持(纯文本、HTML、多部分)
实现使用运行时异常的健壮错误处理
使用示例 mbox 文件添加全面的测试用例
创建中英文双语文档
Describe how to verify it / 描述验证方式
Run the test cases:
mvn test -Dtest=MboxDocumentReaderTest
Check the test coverage report
Verify the sample code in README.md works as expected
运行测试用例:
mvn test -Dtest=MboxDocumentReaderTest
检查测试覆盖率报告
验证 README.md 中的示例代码按预期工作
Special notes for reviews / 审阅特别说明
The implementation follows Spring AI's document reader pattern
Error handling uses runtime exceptions as per project convention
HTML parsing is done using JSoup to ensure clean text extraction
The reader ignores email attachments by design
Documentation is provided in both English and Chinese
实现遵循 Spring AI 的文档读取器模式
按照项目惯例使用运行时异常进行错误处理
使用 JSoup 进行 HTML 解析以确保干净的文本提取
读取器设计上忽略邮件附件
文档提供英文和中文双语版本
close #292