Adaptive Multimodal Prompt for Human-Object Interaction with Local Feature Enhanced Transformer (AMP-HOI)

Overview

AMP-HOI is an end-to-end transformer-based and cnn-based human-object interaction (HOI) detector. [Paper]

Motivation: (1) The loss of crucial features from the original modality during contrastive learning. (2) The limited ability of Transformer-based network architectures to extract local features from samples. (3) There is still room for improvement in the application of prompt learning on HOI.
Components: (1) We proposed an Adaptive Multimodal Prompt module that facilitates the interaction of multimodal cues and provides specific and applicable cues for different modalities. (2) We introduced a novel multimodal feature extraction module called the Local Feature Enhanced Transformer (LFET), which effectively extracts multimodal features from both global and local perspectives.

Preparation

Installation

Our code is built upon CLIP. This repo requires to install PyTorch and torchvision, as well as small additional dependencies.

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install ftfy regex tqdm numpy Pillow matplotlib

Dataset

The experiments are mainly conducted on HICO-DET dataset. We follow this repo to prepare the HICO-DET dataset.

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory. We use the annotation files provided by the PPDM authors. We re-organize the annotation files with additional meta info, e.g., image width and height. The annotation files can be downloaded from here. The downloaded files have to be placed as follows. Otherwise, please replace the default path to your custom locations in datasets/hico.py.

 |─ data
 │   └─ hico_20160224_det
 |       |- images
 |       |   |─ test2015
 |       |   |─ train2015
 |       |─ annotations
 |       |   |─ trainval_hico_ann.json
 |       |   |─ test_hico_ann.json
 :       :

Training

Run this command to train the model in HICO-DET dataset

python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py \
    --batch_size 8 \
    --output_dir [path to save checkpoint] \
    --epochs 30 \
    --lr 1e-4 --min-lr 1e-7 \
    --hoi_token_length 10 \
    --enable_dec \
    --enable_resnet50 \
    --enable_gru \
    --enable_text_lambda \
    --enable_visual_lambda1 \
    --enable_visual_lambda2 \
    --lamb 0.6 \
    --enable_unified_prompt \
    --dataset_file hico

Inference

Run this command to evaluate the model on HICO-DET dataset

python main.py --eval \
    --batch_size 1 \
    --output_dir [path to save results] \
    --hoi_token_length 10 \
    --enable_dec \
    --pretrained [path to the pretrained model] \
    --eval_size 256 [or 224 448 ...] \
    --test_score_thresh 1e-4 \
    --enable_resnet50 \
    --enable_gru \
    --enable_text_lambda \
    --enable_visual_lambda1 \
    --enable_visual_lambda2 \
    --lamb 0.6 \
    --enable_unified_prompt \
    --dataset_file hico

Models

Model	dataset	HOI Tokens	AP seen	AP unseen	Full	Checkpoint
`AMP-HOI`	HICO-DET	10	25.91	19.23	24.44	params

Citing

Please consider citing our paper if it helps your research.

None

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clip		clip
data		data
datasets		datasets
figures		figures
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
baseline_disjoint_detector_and_clip.py		baseline_disjoint_detector_and_clip.py
engine.py		engine.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Multimodal Prompt for Human-Object Interaction with Local Feature Enhanced Transformer (AMP-HOI)

Overview

Preparation

Installation

Dataset

HICO-DET

Training

Inference

Models

Citing

About

Releases

Packages

Languages

License

small-code-cat/AMP-HOI

Folders and files

Latest commit

History

Repository files navigation

Adaptive Multimodal Prompt for Human-Object Interaction with Local Feature Enhanced Transformer (AMP-HOI)

Overview

Preparation

Installation

Dataset

HICO-DET

Training

Inference

Models

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages