DCFA-YOLO: YOLOv8-based Multi-modal Object Detection Model

Official implementation of the paper DCFA-YOLO: A Dual-Channel Cross-Feature-Fusion Attention YOLO Network for Cherry Tomato Bunch Detection.

A dual-channel cross-feature-fusion attention YOLO network for robust multi-modal object detection, supporting RGB-Depth dual-modal inputs with enhanced feature fusion capabilities.

Features

Supports Adam and SGD optimizers
Supports heatmap visualization
Dual-channel cross-modal feature fusion
Multi-scale feature extraction

Requirements

Python 3.7+
PyTorch 1.7.1+ (recommended for AMP mixed-precision training)
CUDA 10.2+
OpenCV
NumPy

Quick Start

Install Dependencies

pip install -r requirements.txt

Training Steps

1. Data Preparation

Prepare VOC-format dataset
Place RGB images in VOCdevkit/VOC2007/JPEGImages_rgb
Place Depth images in VOCdevkit/VOC2007/JPEGImages_nir
Place annotation files in VOCdevkit/VOC2007/Annotations

2. Data Preprocessing

Modify parameters in voc_annotation_mul.py and run:

python voc_annotation_mul.py

3. Start Training

python train_mul.py

Inference

Modify model path and class file path in yolo_mul.py
Run prediction script:

python predict_mul.py

Model Evaluation

VOC Dataset Evaluation

Modify model path and class file path in yolo_mul.py
Run evaluation script:

python get_map_mul.py

Multi-modal Input Specification

This project supports RGB and Depth dual-modal input with the following requirements:

RGB images: Standard 3-channel color images, stored in JPEGImages_rgb
Depth images: Single-channel grayscale images, stored in JPEGImages_nir
Image dimensions must match
File names must strictly correspond (e.g., 001.jpg and 001.png)

Script Description

All core scripts have been modified for multi-modal input, including:

voc_annotation_mul.py: Multi-modal data preprocessing
train_mul.py: Multi-modal training script
predict_mul.py: Multi-modal inference script
yolo_mul.py: Core implementation of multi-modal model
get_map_mul.py: Multi-modal evaluation script

References

License

This project is licensed under the MIT License. See LICENSE file for details.

Contact Us

For any questions, please contact us via:

Email: [email protected]
GitHub Issues: New Issue

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
img		img
model_data		model_data
nets		nets
utils		utils
LICENSE		LICENSE
README.md		README.md
get_map_mul.py		get_map_mul.py
predict_mul.py		predict_mul.py
requirements.txt		requirements.txt
summary.py		summary.py
train_mul.py		train_mul.py
voc_annotation_mul.py		voc_annotation_mul.py
yolo_mul.py		yolo_mul.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCFA-YOLO: YOLOv8-based Multi-modal Object Detection Model

Features

Requirements

Quick Start

Install Dependencies

Training Steps

1. Data Preparation

2. Data Preprocessing

3. Start Training

Inference

Model Evaluation

VOC Dataset Evaluation

Multi-modal Input Specification

Script Description

References

License

Contact Us

About

Releases

Packages

Languages

License

heitieya/DCFA-YOLO

Folders and files

Latest commit

History

Repository files navigation

DCFA-YOLO: YOLOv8-based Multi-modal Object Detection Model

Features

Requirements

Quick Start

Install Dependencies

Training Steps

1. Data Preparation

2. Data Preprocessing

3. Start Training

Inference

Model Evaluation

VOC Dataset Evaluation

Multi-modal Input Specification

Script Description

References

License

Contact Us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages