Discussion: Update GPT eval models #294

zhijian-liu · 2024-10-04T13:39:47Z

Some benchmarks (such as ActivityNet, VideoChatGPT, and many others) use gpt-3.5-turbo-0613 for evaluation, but this model has been discontinued by OpenAI. One quick fix would be to switch to gpt-3.5-turbo, but I would also like to bring up the discussion whether to switch all gpt-3.5-turbo to gpt-4o-mini as its performance is better and is 3 times cheaper.

After the discussion, I'm happy to submit the PR to make the change. @Luodian @kcz358

The text was updated successfully, but these errors were encountered:

Luodian · 2024-10-04T15:10:00Z

I would surely suggest using gpt-4o-mini.

We should inform users about the eval models change after certain PR, likely we should specify the changed datasets in this issue (the eval models used before/after).

I'll pin this issue and link to the PR for visibility if the PR is created.

tyleryzhu · 2025-02-14T01:33:45Z

In case this is a helpful datapoint, I tested 4o-mini perf for video eval vs. turbo-0613 and I got significantly different (worse often) results. Just something to note for consistency!

Luodian added the discussion label Oct 4, 2024

Luodian pinned this issue Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Update GPT eval models #294

Discussion: Update GPT eval models #294

zhijian-liu commented Oct 4, 2024

Luodian commented Oct 4, 2024 •

edited

Loading

tyleryzhu commented Feb 14, 2025

Discussion: Update GPT eval models #294

Discussion: Update GPT eval models #294

Comments

zhijian-liu commented Oct 4, 2024

Luodian commented Oct 4, 2024 • edited Loading

tyleryzhu commented Feb 14, 2025

Luodian commented Oct 4, 2024 •

edited

Loading