Skip to content
/ gdpo Public

Code for GFlowNet-DPO (Direct Preference Optimization) EMNLP 2024 Main

Notifications You must be signed in to change notification settings

ggoggam/gdpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GDPO: GFlowNet Direct Preference Optimization

This is a repository containing the code for GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets, published in EMNLP 2024 Main.

💻 Setup

We highly recommend using uv as the main package and dependency manager for python.

⚙️ Configurations

For configuration and script arguments, the repository uses tyro. Refer to config/__init__.py for details and arguments.

1. Training

Make sure you modify the appropriate accelerate config located in config/accelerate directory according to your machine configuration. From the /src directory, run training by one of the following commands with a choice of machine type.

uv run accelerate launch --config-file config/accelerate/{type}.yaml train.py ...
# or equivalently
uv run -m accelerate.commands.launch --config-file config/accelerate/{MACHINE_TYPE}.yaml train.py [OPTIONS]

For now, we only provide offline training, which was the focus of the paper.

2. Generating

Once the training is done, generate responses from a task by running:

uv run generate.py [OPTIONS]

The responses are generated via vllm, which provides memory-efficient and resource optimized batched inference. Hence, it does not need to be run via accelerate run command.

3. Evaluating

Evaluate on generated responses

uv run evaluate.py [OPTIONS]

📖 Reference

@inproceedings{kwon-etal-2024-gdpo,
    title = "{GDPO}: Learning to Directly Align Language Models with Diversity Using {GF}low{N}ets",
    author = "Kwon, Oh Joon  and
      Matsunaga, Daiki E.  and
      Kim, Kee-Eung",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.951",
    pages = "17120--17139",
}

About

Code for GFlowNet-DPO (Direct Preference Optimization) EMNLP 2024 Main

Resources

Stars

Watchers

Forks

Packages

No packages published