Organizing questions into syllabus-like structure

Nikita Zhiltsov

unread,

Oct 23, 2012, 1:22:38 PM10/23/12

to ic...@googlegroups.com

Let's elaborate on this task proposed by Yu in this thread.

Азат Хасаншин

unread,

Oct 24, 2012, 6:27:50 AM10/24/12

to iCQA

Hello, Yu. I'm also interested in this task. Could you give us some indications on how to start working on it, may be existing papers?

23.10.2012 21:22 пользователь "iCQA on behalf of Nikita Zhiltsov" <icqa+noreply-APn2wQepKAJS0lVTFxy...@googlegroups.com> написал:

Let's elaborate on this task proposed by Yu in this thread.

--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/AIFLvpJY5bkJ.
For more options, visit https://groups.google.com/groups/opt_out.

Nikita Zhiltsov

unread,

Oct 25, 2012, 2:19:19 PM10/25/12

to ic...@googlegroups.com

Azat,

the task looks quite novel. Let's start with brainstorming possible solutions for it.

So, given a set of questions tagged by some category of interest, e.g. 'java', we need to organize them according to a reasonable hierarchical structure (1st layer - topics, 2nd layer - questions), i.e. with basic and easy topics as well as questions coming first, more complex topic coming next and so on.

One of the ideas may be as follows (@Yu, please amend it if something's wrong):

We have an example syllabus from a textbook or online course, i.e. a list of topics.
We use topic titles as queries and issue those queries (and, perhaps, some additional queries after applying expansion techniques like using synonyms or pseudo relevance feedback) to a search system. In our case, we may exploit StackOverflow API or Sphinx search engine facilities after populating the data into iCQA app and configuring it properly.
We get top-k results ranked by relevance. Represent them as a vector of features (relevance score, asker/answerer ratings etc.) and apply clustering techniques. The questions, which will be located close to cluster centroids, may be good candidates.
Usefulness of the resulting syllabus should be evaluated, e.g. by using Amazon MTurk.

On Wednesday, October 24, 2012 6:27:50 AM UTC-4, Азат Хасаншин wrote:

Hello, Yu. I'm also interested in this task. Could you give us some indications on how to start working on it, may be existing papers?

23.10.2012 21:22 пользователь "iCQA on behalf of Nikita Zhiltsov" <icqa+noreply-APn2wQepKAJS0lVTFxyvv5p95eEp0_RTt8uDiJrP4vkiZOx7d9t@googlegroups.com> написал:

Qiaoling Liu

unread,

Oct 25, 2012, 3:41:40 PM10/25/12

to iCQA

Nice. I think that could be a good start.

Actually, I think there are two directions that we could go:
1. develop algorithms to automatically build a syllabus based on the questions.
2. Given any syllabus (e.g. from a text book or course web page), develop algorithms to organize the questions around the syllabus; More specifically, select important questions for each topic in syllabus and rank them from easy to hard.

Are we going to the second direction? I personally think it's easier to start.

To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/eDhsDkCqzS0J.

Yu Wang

unread,

Nov 1, 2012, 2:22:31 PM11/1/12

to iCQA

Selecting representative questions for topics in syllabus.
Since the goal is to organize questions into a syllabus structure, the
whole process can therefore be decomposed into two parts: (1) given a
question, which chapter/section/topic in the syllabus it belongs to;
(2) given a topic in syllabus, what are the top N representative
questions.
(1) A probabilistic way of stating the idea is P(z|q), where q is a
question in the collection, and z is a topic in syllabus (for example,
the title of a chapter).
(2) "Representative" is different from popularity (e.g. vote in Stack
Overflow) or relevance measure (e.g. score of a ranking algorithm).
Because we are building materials for learners, "representative" here
means the usefulness for learners or beginners. It is more like a
combination of difficulty level, relevance, and coverage (how crucial
the question is in the scope of the topic).
Selecting questions asked by learners.
One possible way to accomplish (2) is to find high quality questions
asked by real learners. So the problem becomes identifying good
learners according to their questions. We assume that learning process
usually follows pre-defined logistics when the users learn through
books or online materials. When they ask questions along their
learning process, the topic of the questions would align with the
logistics defined in those materials. We propose the topic transition
matrix: p(z_i | z_j) defines the probability of topic z_i is learned
(introduced in the book) after topic z_j. For example, when users
learn Java programming language, they usually start with data types,
if else statement, and then for loop. So p(if else|data types) and
p(for loop| if else) will have high values, but p(data types| for
loop) will be low. When the topic transition matrix is found, we can
therefore measure how likely the user is a real learner by checking if
topic transitions in his question stream follow the transition matrix.

On Wed, Oct 31, 2012 at 8:44 PM, Yu Wang <ywa...@emory.edu> wrote:

Hi Azat,

Sorry for the late response. My research focuses on the temporal part
of the project (topic transition in my previous email). And I am not
an expert in CQA. I think Qiaoling could give better insight of the
novelty of our project, and also provide some related work.

Thanks,
Yu

On Wed, Oct 31, 2012 at 8:41 PM, Yu Wang <ywa...@emory.edu> wrote:
> Selecting representative questions for topics in syllabus.
> Since the goal is to organize questions into a syllabus structure, the
> whole process can therefore be decomposed into two parts: (1) given a
> question, which chapter/section/topic in the syllabus it belongs to;
> (2) given a topic in syllabus, what are the top N representative
> questions.
> (1) A probabilistic way of stating the idea is P(z|q), where q is a
> question in the collection, and z is a topic in syllabus (for example,
> the title of a chapter).
> (2) "Representative" is different from popularity (e.g. vote in Stack
> Overflow) or relevance measure (e.g. score of a ranking algorithm).
> Because we are building materials for learners, "representative" here
> means the usefulness for learners or beginners. It is more like a
> combination of difficulty level, relevance, and coverage (how crucial
> the question is in the scope of the topic).
> Selecting questions asked by learners.
> One possible way to accomplish (2) is to find high quality questions
> asked by real learners. So the problem becomes identifying good
> learners according to their questions. We assume that learning process
> usually follows pre-defined logistics when the users learn through
> books or online materials. When they ask questions along their
> learning process, the topic of the questions would align with the
> logistics defined in those materials. We propose the topic transition
> matrix: p(z_i | z_j) defines the probability of topic z_i is learned
> (introduced in the book) after topic z_j. For example, when users
> learn Java programming language, they usually start with data types,
> if else statement, and then for loop. So p(if else|data types) and
> p(for loop| if else) will have high values, but p(data types| for
> loop) will be low. When the topic transition matrix is found, we can
> therefore measure how likely the user is a real learner by checking if
> topic transitions in his question stream follow the transition matrix.

>>>> <icqa+noreply-APn2wQepKAJS0lVTFxy...@googlegroups.com>

Yu Wang

unread,

Nov 1, 2012, 2:22:50 PM11/1/12

to iCQA

Hi Azat,

Sorry for the late response. My research focuses on the temporal part
of the project (topic transition in my previous email). And I am not
an expert in CQA. I think Qiaoling could give better insight of the
novelty of our project, and also provide some related work.

Thanks,
Yu

Qiaoling Liu

unread,

Nov 1, 2012, 4:52:45 PM11/1/12

to iCQA on behalf of Yu Wang

Nice ideas.

Since "representative is more like a combination of difficulty level, relevance, and coverage (how crucial
the question is in the scope of the topic)", do we need a model to evaluate the "representativity" of the top N questions for a topic, i.e. P(q_1, q_2, q_3, ... q_N|z). Note that the N questions here are not independent of each other.

Reply all

Reply to author

Forward