Fwd: [CSC - Faculty] Oral Preliminary Examination (PhD) for Angela Zhang

6 views

Skip to first unread message

Tiffany Barnes

unread,

Jul 3, 2025, 5:38:50 PMJul 3

to NCSU Games+Learning Lab

seems interesting.

Tiffany Barnes
Distinguished Professor of Computer Science
North Carolina State University

(she / her / hers)

Note: I check email M-Th at 12:30pm (why?). Apologies for delayed response.

---------- Forwarded message ---------
From: 'CSC Graduate Office' via csc-faculty <csc-f...@lists.ncsu.edu>
Date: Thu, Jul 3, 2025 at 3:02 PM
Subject: [CSC - Faculty] Oral Preliminary Examination (PhD) for Angela Zhang
To: CSC Graduate Office <csc-gra...@ncsu.edu>

bcc: csc-g...@lists.ncsu.edu; csc-f...@lists.ncsu.edu

Oral Preliminary Examination (PhD) for Angela Zhang

Title: Topic Focused Peer Review Automation using Unsupervised Learning and Large Language Models

Date: Thursday July 10, 2025

Time: 10 am

Location: EB2, Room 3265 and Zoom Link: https://ncsu.zoom.us/j/95997391236?pwd=fuTha1owTPCaLgXkd6jIZggqJJ6aih.1

Examination Committee:

Dr. Christopher Healey (Chair)

Dr. Rada Chirkova (Graduate School Representative)

Dr. James Lester

Dr. Jung-Eun Kim

All the department's graduate faculty and students are invited.

* * *

Abstract:

Publication readiness is often an ambiguous decision-making process requiring a lengthy, manual, and convoluted literature search and review. We propose a multi-stage automated solution that provides users with a topic-level readiness assessment. The core questions addressed through this pipeline are “Which topics did the author potentially overlook?’’ and “What topics did the author address that are novel?’’ Our approach combines unsupervised text clustering with state-of-the-art large language model (LLM) reasoning to: (1) describe core topics present in semantically similar published papers that the user may have overlooked, and (2) identify novel topics the user addresses. The pipeline does not provide a binary “ready/ not ready’’ assessment. Instead, the goal is to offer concise, qualitative data to assist in the subjective decision-making of publication readiness.

Existing research began exploring LLMs' potential in performing both literature and peer reviews as separate studies. To the best of our knowledge, this work is the first to connect both aspects. The result summarizes topic comparisons between a user-provided abstract and published works in similar domains. Our proposed research introduces a multi-stage, automated pipeline consisting of optimizing document clustering for domain-specific abstracts, topic modelling, and personalizing topic context with user-provided abstracts through specialized LLM prompting.

Clustering paper abstracts include hierarchical clustering with dimension reduction to optimize for outlier precision and evaluation of Jaccard undersampling effectiveness in producing precise, single-topic clusters. Topics are then extracted from published documents and user-given abstracts. To personalize summaries, topic-enhanced RAG prompts are tested with an LLM reasoning model (e.g., Grok-3) to evaluate topic comprehension between the most similar published abstracts and the user-provided document.

Preliminary experiments show that dimension reduction algorithms such as Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) and Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) are essential for avoiding outliers when clustering domain-specific documents. These experiments show an outlier reduction of over 65% versus principal component analysis (PCA), which reduced outliers by less than 20%. We assume that very large clusters may contain several smaller clusters with overlapping topics. Our remaining work will test this hypothesis with Jaccard undersampling evaluations. Using BERTopic, keywords and topics have been determined for clusters of published papers. We will proceed with experiments in topic modelling for a single user-input abstract to optimize matches with published clusters. We can then evaluate topic-enhanced RAG prompts with a reasoning LLM to produce concise and correct summaries.
---

Computer Science Graduate Programs

go.ncsu.edu/csc-grad-resources

NC State University

Reply all

Reply to author

Forward

0 new messages