Release v0.3.1 · EvolvingLMMs-Lab/lmms-eval

What's Changed

BugFix: Fixed input to llama_vision processor by @Danielohayon in #431
MixEval-X Image / Video by @pufanyi in #434
MixEval-X Readme by @pufanyi in #444
Fix Llama vision mentioned in #434 by @pufanyi in #447
add task MMVet-v2 by @frankRenlf in #451
Fix mmt output format by @ngquangtrung57 in #454
[Fixed] metric names in NaturalBench dataset by @Baiqi-Li in #455
[Fix] fix mia-bench evaluation by @Luodian in #456
Fix MMVet V2 by @pufanyi in #457
Mmvetv2 by @frankRenlf in #458
[Fix] remove useless print statements by @pufanyi in #460
[Fix] remove unused text processing notebook by @pufanyi in #485
[Fix] Use no media iterator by @kcz358 in #486
Add VL-RewardBench dataset by @TobiasLee in #484
Delete model and cache before multigpu data gathering by @xumingze0308 in #489
Add MEGA-Bench by @woodfrog in #496
[WIP] style(megabench): improve code formatting and import ordering by @Luodian in #497
Fix llama_vision chat_template and decode by @coding-famer in #498
[Support] Support new model: Ross by @Haochen-Wang409 in #494
[FIX] Minor errors in gemini_api.py and internvl2.py. by @skyil7 in #502
Fix NoneType Error in flatten Function for Text-Only Tasks in LLAVA Models by @bibisbar in #501
Fix device_map by @coding-famer in #505
Fix custom model wrapper to enable usage of instanced model by @ErezSC42 in #508
fix output format of airbench and vocal sound by @pbcong in #510
Update README.md by @KairuiHu in #513
Add covost2 zh en by @pbcong in #515
[Fix] Fix dataset processing logic for common voice and gigaspeech by @kcz358 in #517
Fix language in common voice by @kcz358 in #518
[Dataset] Adding Fleurs en/cn split by @kcz358 in #516
Change fleurs path by @kcz358 in #519
add covost2_en_zh task by @ngquangtrung57 in #520
[Feat] Add VITA 1.5 into lmms-eval by @kcz358 in #521
[Fix] megabench evaluator metric type determination by @woodfrog in #523
fix aggregation function and remove redundancies by @pbcong in #522
Add VideoMMMU task and Support Qwen2.5-vl Model by @KairuiHu in #524
[Add Dataset] HR-Bench (AAAI 2025) by @DreamMr in #525
[Feat] add @maj and @pass to support sampling multiples times during evaluation by @Luodian in #526
[Feat] add mathvision datasets by @Luodian in #527
[Fix] of "Model llavavid not found in available models." by @zhshj0110 in #528
Replaced incorrect variable name self._word_size to self._world_size by @priancho in #535
Yhzhang/add charades sta by @ZhangYuanhan-AI in #536
[Fix] of "evaluation of llava_vid on mvbench" by @zhshj0110 in #541
[Model] add vllm compatible models by @Luodian in #544
[Model] add openai compatible API interface by @Luodian in #546

New Contributors

@Danielohayon made their first contribution in #431
@frankRenlf made their first contribution in #451
@TobiasLee made their first contribution in #484
@xumingze0308 made their first contribution in #489
@woodfrog made their first contribution in #496
@coding-famer made their first contribution in #498
@Haochen-Wang409 made their first contribution in #494
@bibisbar made their first contribution in #501
@ErezSC42 made their first contribution in #508
@KairuiHu made their first contribution in #513
@DreamMr made their first contribution in #525
@zhshj0110 made their first contribution in #528
@priancho made their first contribution in #535

Full Changelog: v0.3.0...v0.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.1

What's Changed

New Contributors

Contributors