Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Update GPT eval models #294

Open
zhijian-liu opened this issue Oct 4, 2024 · 2 comments
Open

Discussion: Update GPT eval models #294

zhijian-liu opened this issue Oct 4, 2024 · 2 comments

Comments

@zhijian-liu
Copy link
Contributor

Some benchmarks (such as ActivityNet, VideoChatGPT, and many others) use gpt-3.5-turbo-0613 for evaluation, but this model has been discontinued by OpenAI. One quick fix would be to switch to gpt-3.5-turbo, but I would also like to bring up the discussion whether to switch all gpt-3.5-turbo to gpt-4o-mini as its performance is better and is 3 times cheaper.

After the discussion, I'm happy to submit the PR to make the change. @Luodian @kcz358

@Luodian
Copy link
Contributor

Luodian commented Oct 4, 2024

I would surely suggest using gpt-4o-mini.

We should inform users about the eval models change after certain PR, likely we should specify the changed datasets in this issue (the eval models used before/after).

I'll pin this issue and link to the PR for visibility if the PR is created.

@Luodian Luodian pinned this issue Oct 13, 2024
@tyleryzhu
Copy link

In case this is a helpful datapoint, I tested 4o-mini perf for video eval vs. turbo-0613 and I got significantly different (worse often) results. Just something to note for consistency!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants