MEDICAL LLM BENCHMARK

Keeping track of LLM performances on various Biomedical/Clinical NLP tasks and datasets.

NER

Scores are precision/recall/micro-f1. Trained models can be found here to reproduce results.

Model	BC2GM	BC5CDR-chem	BC5CDR-disease	JNLPBA	NCBI-disease
distilbert-FT	0.76/0.79/0.77	0.89/0.87/0.88	0.76/0.81/0.79	0.73/0.83/0.78	0.81/0.86/0.84