Question regarding training schemes #25

JaejinCho · 2025-01-26T03:50:50Z

Hello,
First of all, thank you for the efforts to share this repo and your work. I found this very interesting!

If I understand correctly, the reward model is trained with SFT using LLaMA-Factory, considering the conversation here: #21
Now, for training the proactive agent model, you also seemed to have used the SFT considering how the training set looks like, am I right?
I was wondering if you tried other fine-tuning like preference tuning as I first guessed that when I saw the equation 3 in your paper: https://arxiv.org/pdf/2410.12361

Please correct me if I understand anything wrong.

Sincerely,
Jaejin Cho

@luyaxi Hello, Yaxi. Would you mind look at my question when you have time?

luyaxi · 2025-02-06T03:55:58Z

Sorry for late reply, we do not take other tunning techs.
The eq3 indicates how we use reward model to filter response, which can be looked as most basic preference in some aspects . For typical perference training, paired responses are required, which is hard to obtain in our settings. To avoid over complexify the problem, we use the most basic sft as our training methods.
If you have further question, plz let us know!

JaejinCho · 2025-02-10T19:06:09Z

@luyaxi Thank you for the answer! I am currently try to generate dataset for proactive agent training. However, configuring the gym does not work as it is in this repo.

Before I look into the details, I was wondering where you got the inspiration of writing gym in this repo as I would like to adapt it for my own purpose. Just wanted to get some conceptual idea behind the codes written, specifically for your paper.

luyaxi · 2025-02-11T08:07:05Z

@luyaxi Thank you for the answer! I am currently try to generate dataset for proactive agent training. However, configuring the gym does not work as it is in this repo.

Before I look into the details, I was wondering where you got the inspiration of writing gym in this repo as I would like to adapt it for my own purpose. Just wanted to get some conceptual idea behind the codes written, specifically for your paper.

I'd like to help correct any configuration problems, you can share your configs here (remember to omit any sensitive information like API keys),

We are firstly inspired by the Gymnasium presented by OpenAI and the Text World offered by Microsoft. Both of them create a simulation environment to help solve the decision optimization for agents. By integrating the simulated environment with a custom agent, one can use an outcome evaluator to directly optimize the decision-making process as a whole, which is important as the fine-grained evaluation could be incredibly complex for the advanced agents.

JaejinCho · 2025-02-12T23:00:11Z

Thank you so much @luyaxi for your answer and suggesting the help! I have the follow-up questions as below. I would appreciate if you could answer when you have time.

I'm actually curious if you have any tips regarding how to walk through the code with vscode debugger or with pdb as it seems there are a lot of asyncio or parallel runs were used (as asking questions every time seems too much). I am trying to check the code in-depth to fill in the knowledge gap between your paper and the implementation.
Do you have a documentation for CodeLinker? I found one here (https://pypi.org/project/codelinker/) but it seems that some links are not working. I was looking for more comprehensive docs.
Also, why do we need this line as we do not use the agent anywhere after that in the script:

ProactiveAgent/gym/main.py

Line 40 in a2daeb7

agent = ProactiveAgent(**cfg["agent"])

JaejinCho closed this as completed Feb 10, 2025

JaejinCho reopened this Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding training schemes #25

Question regarding training schemes #25

JaejinCho commented Jan 26, 2025 •

edited

Loading

luyaxi commented Feb 6, 2025

JaejinCho commented Feb 10, 2025

luyaxi commented Feb 11, 2025

JaejinCho commented Feb 12, 2025 •

edited

Loading

Question regarding training schemes #25

Question regarding training schemes #25

Comments

JaejinCho commented Jan 26, 2025 • edited Loading

luyaxi commented Feb 6, 2025

JaejinCho commented Feb 10, 2025

luyaxi commented Feb 11, 2025

JaejinCho commented Feb 12, 2025 • edited Loading

JaejinCho commented Jan 26, 2025 •

edited

Loading

JaejinCho commented Feb 12, 2025 •

edited

Loading