WeChat Mini Program
Old Version Features

Towards Personalized Evaluation of Large Language Models with an Anonymous Crowd-Sourcing Platform

WWW 2024(2024)

Cited 5|Views85
Abstract
Large language model evaluation plays a pivotal role in the enhancement ofits capacity. Previously, numerous methods for evaluating large language modelshave been proposed in this area. Despite their effectiveness, these existingworks mainly focus on assessing objective questions, overlooking the capabilityto evaluate subjective questions which is extremely common for large languagemodels. Additionally, these methods predominantly utilize centralized datasetsfor evaluation, with question banks concentrated within the evaluationplatforms themselves. Moreover, the evaluation processes employed by theseplatforms often overlook personalized factors, neglecting to consider theindividual characteristics of both the evaluators and the models beingevaluated. To address these limitations, we propose a novel anonymouscrowd-sourcing evaluation platform, BingJian, for large language models thatemploys a competitive scoring mechanism where users participate in rankingmodels based on their performance. This platform stands out not only for itssupport of centralized evaluations to assess the general capabilities of modelsbut also for offering an open evaluation gateway. Through this gateway, usershave the opportunity to submit their questions, testing the models on apersonalized and potentially broader range of capabilities. Furthermore, ourplatform introduces personalized evaluation scenarios, leveraging various formsof human-computer interaction to assess large language models in a manner thataccounts for individual user preferences and contexts. The demonstration ofBingJian can be accessed at https://github.com/Mingyue-Cheng/Bingjian.
More
Translated text
Key words
Topic Modeling,User Modeling
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined