Invitation to Participate in Research Study on What Makes a High-Quality LLM Dataset (5-10minutes)

5 views
Skip to first unread message

bailing song

unread,
May 27, 2024, 10:27:48 AMMay 27
to keras...@googlegroups.com

Dear LLM Experts,

We, researchers from Wuhan University of Technology, Zhejiang University, City University of Hong Kong, and Monash University, are currently engaged in a research project aimed at defining the characteristics of high-quality datasets for training and fine-tuning large language models (LLMs).

Your insights and expertise are invaluable to us in this endeavor. We kindly invite you to participate in our survey, which will take approximately 5-10 minutes of your time. You can access the survey through the following link:

Google form https://docs.google.com/forms/d/e/1FAIpQLSdw2ryGChoRg8u1dOH7xJe9QXTl4dDn-q2xEho1KcIqthqz5A/viewform

Or

Wenjuanxing: https://w.wjx.com/vm/rXghUp3.aspx#

The primary objectives of our study are twofold:

(1) To investigate LLM practitioners' perceptions of high-quality training and fine-tuning datasets.

(2) To explore the practices and challenges associated with preprocessing and evaluating the quality of such datasets.

Your participation will significantly contribute to advancing our understanding of this topic. Subsequently, we will meticulously analyze your responses, along with those of other LLM experts, to compile a comprehensive research report. This report aims to offer insights and recommendations to the academic, industrial, and policymaking communities regarding the characteristics of high-quality LLM datasets, as well as data preprocessing and quality evaluation methods.

We sincerely appreciate your time and valuable input. Please note that this survey is anonymous, and no personal information will be collected. All information collected will be kept strictly confidential and used solely for academic research purposes.

Should you have any questions about our study, please feel free to contact the principal investigators: Xiao Yu (Wuhan University of Technology, xia...@whut.edu.cn, https://scholar.google.com/citations?hl=zh-CN&user=sMOc8IcAAAAJ), Bailing Song (Wuhan University of Technology, bailing...@hotmail.com), and Xing Hu (Zhejiang University, xin...@zju.edu.cn, https://xing-hu.github.io).

Thank you very much for your consideration and participation.

Best Regards,

Research Group

Reply all
Reply to author
Forward
0 new messages