This is a repository containing the code for GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets, published in EMNLP 2024 Main.
We highly recommend using uv
as the main package and dependency manager for python.
For configuration and script arguments, the repository uses tyro
.
Refer to config/__init__.py
for details and arguments.
Make sure you modify the appropriate accelerate
config located in config/accelerate
directory according to your machine configuration. From the /src
directory, run training by one of the following commands with a choice of machine type.
uv run accelerate launch --config-file config/accelerate/{type}.yaml train.py ...
# or equivalently
uv run -m accelerate.commands.launch --config-file config/accelerate/{MACHINE_TYPE}.yaml train.py [OPTIONS]
For now, we only provide offline training, which was the focus of the paper.
Once the training is done, generate responses from a task by running:
uv run generate.py [OPTIONS]
The responses are generated via vllm
, which provides memory-efficient and resource optimized batched inference.
Hence, it does not need to be run via accelerate run
command.
Evaluate on generated responses
uv run evaluate.py [OPTIONS]
@inproceedings{kwon-etal-2024-gdpo,
title = "{GDPO}: Learning to Directly Align Language Models with Diversity Using {GF}low{N}ets",
author = "Kwon, Oh Joon and
Matsunaga, Daiki E. and
Kim, Kee-Eung",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.951",
pages = "17120--17139",
}