What's Changed
- BugFix: Fixed input to llama_vision processor by @Danielohayon in #431
- MixEval-X Image / Video by @pufanyi in #434
- MixEval-X Readme by @pufanyi in #444
- Fix Llama vision mentioned in #434 by @pufanyi in #447
- add task MMVet-v2 by @frankRenlf in #451
- Fix mmt output format by @ngquangtrung57 in #454
- [Fixed] metric names in NaturalBench dataset by @Baiqi-Li in #455
- [Fix] fix mia-bench evaluation by @Luodian in #456
- Fix MMVet V2 by @pufanyi in #457
- Mmvetv2 by @frankRenlf in #458
- [Fix] remove useless print statements by @pufanyi in #460
- [Fix] remove unused text processing notebook by @pufanyi in #485
- [Fix] Use no media iterator by @kcz358 in #486
- Add VL-RewardBench dataset by @TobiasLee in #484
- Delete model and cache before multigpu data gathering by @xumingze0308 in #489
- Add MEGA-Bench by @woodfrog in #496
- [WIP] style(megabench): improve code formatting and import ordering by @Luodian in #497
- Fix llama_vision chat_template and decode by @coding-famer in #498
- [Support] Support new model: Ross by @Haochen-Wang409 in #494
- [FIX] Minor errors in
gemini_api.py
andinternvl2.py
. by @skyil7 in #502 - Fix NoneType Error in
flatten
Function for Text-Only Tasks in LLAVA Models by @bibisbar in #501 - Fix device_map by @coding-famer in #505
- Fix custom model wrapper to enable usage of instanced model by @ErezSC42 in #508
- fix output format of airbench and vocal sound by @pbcong in #510
- Update README.md by @KairuiHu in #513
- Add covost2 zh en by @pbcong in #515
- [Fix] Fix dataset processing logic for common voice and gigaspeech by @kcz358 in #517
- Fix language in common voice by @kcz358 in #518
- [Dataset] Adding Fleurs en/cn split by @kcz358 in #516
- Change fleurs path by @kcz358 in #519
- add covost2_en_zh task by @ngquangtrung57 in #520
- [Feat] Add VITA 1.5 into lmms-eval by @kcz358 in #521
- [Fix] megabench evaluator metric type determination by @woodfrog in #523
- fix aggregation function and remove redundancies by @pbcong in #522
- Add VideoMMMU task and Support Qwen2.5-vl Model by @KairuiHu in #524
- [Add Dataset] HR-Bench (AAAI 2025) by @DreamMr in #525
- [Feat] add @maj and @pass to support sampling multiples times during evaluation by @Luodian in #526
- [Feat] add mathvision datasets by @Luodian in #527
- [Fix] of "Model llavavid not found in available models." by @zhshj0110 in #528
- Replaced incorrect variable name self._word_size to self._world_size by @priancho in #535
- Yhzhang/add charades sta by @ZhangYuanhan-AI in #536
- [Fix] of "evaluation of llava_vid on mvbench" by @zhshj0110 in #541
- [Model] add vllm compatible models by @Luodian in #544
- [Model] add openai compatible API interface by @Luodian in #546
New Contributors
- @Danielohayon made their first contribution in #431
- @frankRenlf made their first contribution in #451
- @TobiasLee made their first contribution in #484
- @xumingze0308 made their first contribution in #489
- @woodfrog made their first contribution in #496
- @coding-famer made their first contribution in #498
- @Haochen-Wang409 made their first contribution in #494
- @bibisbar made their first contribution in #501
- @ErezSC42 made their first contribution in #508
- @KairuiHu made their first contribution in #513
- @DreamMr made their first contribution in #525
- @zhshj0110 made their first contribution in #528
- @priancho made their first contribution in #535
Full Changelog: v0.3.0...v0.3.1