Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding training schemes #25

Open
JaejinCho opened this issue Jan 26, 2025 · 4 comments
Open

Question regarding training schemes #25

JaejinCho opened this issue Jan 26, 2025 · 4 comments

Comments

@JaejinCho
Copy link

JaejinCho commented Jan 26, 2025

Hello,
First of all, thank you for the efforts to share this repo and your work. I found this very interesting!

If I understand correctly, the reward model is trained with SFT using LLaMA-Factory, considering the conversation here: #21
Now, for training the proactive agent model, you also seemed to have used the SFT considering how the training set looks like, am I right?
I was wondering if you tried other fine-tuning like preference tuning as I first guessed that when I saw the equation 3 in your paper: https://arxiv.org/pdf/2410.12361

Please correct me if I understand anything wrong.

Sincerely,
Jaejin Cho

@luyaxi Hello, Yaxi. Would you mind look at my question when you have time?

@luyaxi
Copy link
Collaborator

luyaxi commented Feb 6, 2025

Sorry for late reply, we do not take other tunning techs.
The eq3 indicates how we use reward model to filter response, which can be looked as most basic preference in some aspects . For typical perference training, paired responses are required, which is hard to obtain in our settings. To avoid over complexify the problem, we use the most basic sft as our training methods.
If you have further question, plz let us know!

@JaejinCho
Copy link
Author

@luyaxi Thank you for the answer! I am currently try to generate dataset for proactive agent training. However, configuring the gym does not work as it is in this repo.

Before I look into the details, I was wondering where you got the inspiration of writing gym in this repo as I would like to adapt it for my own purpose. Just wanted to get some conceptual idea behind the codes written, specifically for your paper.

@JaejinCho JaejinCho reopened this Feb 10, 2025
@luyaxi
Copy link
Collaborator

luyaxi commented Feb 11, 2025

@luyaxi Thank you for the answer! I am currently try to generate dataset for proactive agent training. However, configuring the gym does not work as it is in this repo.

Before I look into the details, I was wondering where you got the inspiration of writing gym in this repo as I would like to adapt it for my own purpose. Just wanted to get some conceptual idea behind the codes written, specifically for your paper.

I'd like to help correct any configuration problems, you can share your configs here (remember to omit any sensitive information like API keys),

We are firstly inspired by the Gymnasium presented by OpenAI and the Text World offered by Microsoft. Both of them create a simulation environment to help solve the decision optimization for agents. By integrating the simulated environment with a custom agent, one can use an outcome evaluator to directly optimize the decision-making process as a whole, which is important as the fine-grained evaluation could be incredibly complex for the advanced agents.

@JaejinCho
Copy link
Author

JaejinCho commented Feb 12, 2025

Thank you so much @luyaxi for your answer and suggesting the help! I have the follow-up questions as below. I would appreciate if you could answer when you have time.

  • I'm actually curious if you have any tips regarding how to walk through the code with vscode debugger or with pdb as it seems there are a lot of asyncio or parallel runs were used (as asking questions every time seems too much). I am trying to check the code in-depth to fill in the knowledge gap between your paper and the implementation.

  • Do you have a documentation for CodeLinker? I found one here (https://pypi.org/project/codelinker/) but it seems that some links are not working. I was looking for more comprehensive docs.

  • Also, why do we need this line as we do not use the agent anywhere after that in the script:

    agent = ProactiveAgent(**cfg["agent"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants