Wikibench is a system that enables community members to collaboratively curate AI evaluation datasets, while navigating disagreements and ambiguities through discussion.
There is a local consensus that on-wiki use of the data acquired through this campaign should be limited to the immediate scope of Wikibench. A strong consensus should be established prior to any on-wiki use outside this research project.
The data acquired as part of this campaign is not intended for other uses and may be inappropriate or unsuitable for many purposes. In particular, it is not a representative sample of edits made to the English Wikipedia, nor is it intended as such.
See the Wikipedia page for more details.
All files included in the dataset are released under CC0.
Tzu-Sheng Kuo, Aaron Halfaker, Zirui Cheng, Jiwoo Kim, Meng-Hsin Wu, Tongshuang Wu, Kenneth Holstein, and Haiyi Zhu. 2024. Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), May 11–16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA, 24 pages.
arXiv: https://arxiv.org/abs/2402.14147
ACM Digital Library: https://doi.org/10.1145/3613904.3642278 (The DOI link will become available in May)