Official implementation of the paper DCFA-YOLO: A Dual-Channel Cross-Feature-Fusion Attention YOLO Network for Cherry Tomato Bunch Detection.
A dual-channel cross-feature-fusion attention YOLO network for robust multi-modal object detection, supporting RGB-Depth dual-modal inputs with enhanced feature fusion capabilities.
- Supports Adam and SGD optimizers
- Supports heatmap visualization
- Dual-channel cross-modal feature fusion
- Multi-scale feature extraction
- Python 3.7+
- PyTorch 1.7.1+ (recommended for AMP mixed-precision training)
- CUDA 10.2+
- OpenCV
- NumPy
pip install -r requirements.txt
- Prepare VOC-format dataset
- Place RGB images in
VOCdevkit/VOC2007/JPEGImages_rgb
- Place Depth images in
VOCdevkit/VOC2007/JPEGImages_nir
- Place annotation files in
VOCdevkit/VOC2007/Annotations
Modify parameters in voc_annotation_mul.py
and run:
python voc_annotation_mul.py
python train_mul.py
- Modify model path and class file path in
yolo_mul.py
- Run prediction script:
python predict_mul.py
- Modify model path and class file path in
yolo_mul.py
- Run evaluation script:
python get_map_mul.py
This project supports RGB and Depth dual-modal input with the following requirements:
- RGB images: Standard 3-channel color images, stored in
JPEGImages_rgb
- Depth images: Single-channel grayscale images, stored in
JPEGImages_nir
- Image dimensions must match
- File names must strictly correspond (e.g., 001.jpg and 001.png)
All core scripts have been modified for multi-modal input, including:
voc_annotation_mul.py
: Multi-modal data preprocessingtrain_mul.py
: Multi-modal training scriptpredict_mul.py
: Multi-modal inference scriptyolo_mul.py
: Core implementation of multi-modal modelget_map_mul.py
: Multi-modal evaluation script
This project is licensed under the MIT License. See LICENSE file for details.
For any questions, please contact us via:
- Email: [email protected]
- GitHub Issues: New Issue