Nice ideas.
Since "representative is more like a combination of difficulty level, relevance, and coverage (how crucial
the question is in the scope of the topic)", do we need a model to evaluate the "representativity" of the top N questions for a topic, i.e. P(q_1, q_2, q_3, ... q_N|z). Note that the N questions here are not independent of each other.
On Thu, Nov 1, 2012 at 2:22 PM, iCQA on behalf of Yu Wang <ic...@googlegroups.com> wrote:Selecting representative questions for topics in syllabus.
Since the goal is to organize questions into a syllabus structure, the
whole process can therefore be decomposed into two parts: (1) given a
question, which chapter/section/topic in the syllabus it belongs to;
(2) given a topic in syllabus, what are the top N representative
questions.
(1) A probabilistic way of stating the idea is P(z|q), where q is a
question in the collection, and z is a topic in syllabus (for example,
the title of a chapter).
(2) "Representative" is different from popularity (e.g. vote in Stack
Overflow) or relevance measure (e.g. score of a ranking algorithm).
Because we are building materials for learners, "representative" here
means the usefulness for learners or beginners. It is more like a
combination of difficulty level, relevance, and coverage (how crucial
the question is in the scope of the topic).
Selecting questions asked by learners.
One possible way to accomplish (2) is to find high quality questions
asked by real learners. So the problem becomes identifying good
learners according to their questions. We assume that learning process
usually follows pre-defined logistics when the users learn through
books or online materials. When they ask questions along their
learning process, the topic of the questions would align with the
logistics defined in those materials. We propose the topic transition
matrix: p(z_i | z_j) defines the probability of topic z_i is learned
(introduced in the book) after topic z_j. For example, when users
learn Java programming language, they usually start with data types,
if else statement, and then for loop. So p(if else|data types) and
p(for loop| if else) will have high values, but p(data types| for
loop) will be low. When the topic transition matrix is found, we can
therefore measure how likely the user is a real learner by checking if
topic transitions in his question stream follow the transition matrix.
On Wed, Oct 31, 2012 at 8:44 PM, Yu Wang <ywa...@emory.edu> wrote:Hi Azat,
Sorry for the late response. My research focuses on the temporal part
of the project (topic transition in my previous email). And I am not
an expert in CQA. I think Qiaoling could give better insight of the
novelty of our project, and also provide some related work.
Thanks,
Yu
On Wed, Oct 31, 2012 at 8:41 PM, Yu Wang <ywa...@emory.edu> wrote:
> Selecting representative questions for topics in syllabus.
> Since the goal is to organize questions into a syllabus structure, the
> whole process can therefore be decomposed into two parts: (1) given a
> question, which chapter/section/topic in the syllabus it belongs to;
> (2) given a topic in syllabus, what are the top N representative
> questions.
> (1) A probabilistic way of stating the idea is P(z|q), where q is a
> question in the collection, and z is a topic in syllabus (for example,
> the title of a chapter).
> (2) "Representative" is different from popularity (e.g. vote in Stack
> Overflow) or relevance measure (e.g. score of a ranking algorithm).
> Because we are building materials for learners, "representative" here
> means the usefulness for learners or beginners. It is more like a
> combination of difficulty level, relevance, and coverage (how crucial
> the question is in the scope of the topic).
> Selecting questions asked by learners.
> One possible way to accomplish (2) is to find high quality questions
> asked by real learners. So the problem becomes identifying good
> learners according to their questions. We assume that learning process
> usually follows pre-defined logistics when the users learn through
> books or online materials. When they ask questions along their
> learning process, the topic of the questions would align with the
> logistics defined in those materials. We propose the topic transition
> matrix: p(z_i | z_j) defines the probability of topic z_i is learned
> (introduced in the book) after topic z_j. For example, when users
> learn Java programming language, they usually start with data types,
> if else statement, and then for loop. So p(if else|data types) and
> p(for loop| if else) will have high values, but p(data types| for
> loop) will be low. When the topic transition matrix is found, we can
> therefore measure how likely the user is a real learner by checking if
> topic transitions in his question stream follow the transition matrix.
>
>
> On Thu, Oct 25, 2012 at 3:41 PM, iCQA on behalf of qiaoling.liu
> <ic...@googlegroups.com> wrote:
>> Nice. I think that could be a good start.
>>
>> Actually, I think there are two directions that we could go:
>> 1. develop algorithms to automatically build a syllabus based on the
>> questions.
>> 2. Given any syllabus (e.g. from a text book or course web page), develop
>> algorithms to organize the questions around the syllabus; More specifically,
>> select important questions for each topic in syllabus and rank them from
>> easy to hard.
>>
>> Are we going to the second direction? I personally think it's easier to
>> start.
>>
>>
>> On Thu, Oct 25, 2012 at 2:19 PM, iCQA on behalf of Nikita Zhiltsov
>> <icqa+noreply-APn2wQepKAJS0lVTFxy...@googlegroups.com>
>> wrote:
>>>
>>> Azat,
>>>
>>> the task looks quite novel. Let's start with brainstorming possible
>>> solutions for it.
>>> So, given a set of questions tagged by some category of interest, e.g.
>>> 'java', we need to organize them according to a reasonable hierarchical
>>> structure (1st layer - topics, 2nd layer - questions), i.e. with basic and
>>> easy topics as well as questions coming first, more complex topic coming
>>> next and so on.
>>> One of the ideas may be as follows (@Yu, please amend it if something's
>>> wrong):
>>>
>>> We have an example syllabus from a textbook or online course, i.e. a list
>>> of topics.
>>> We use topic titles as queries and issue those queries (and, perhaps, some
>>> additional queries after applying expansion techniques like using synonyms
>>> or pseudo relevance feedback) to a search system. In our case, we may
>>> exploit StackOverflow API or Sphinx search engine facilities after
>>> populating the data into iCQA app and configuring it properly.
>>> We get top-k results ranked by relevance. Represent them as a vector of
>>> features (relevance score, asker/answerer ratings etc.) and apply clustering
>>> techniques. The questions, which will be located close to cluster centroids,
>>> may be good candidates.
>>> Usefulness of the resulting syllabus should be evaluated, e.g. by using
>>> Amazon MTurk.
>>>
>>>
>>> On Wednesday, October 24, 2012 6:27:50 AM UTC-4, Азат Хасаншин wrote:
>>>>
>>>> Hello, Yu. I'm also interested in this task. Could you give us some
>>>> indications on how to start working on it, may be existing papers?
>>>>
>>>> 23.10.2012 21:22 пользователь "iCQA on behalf of Nikita Zhiltsov"
>>>> <icqa+noreply-APn2wQepKAJS0lVTFxy...@googlegroups.com>
>>>> написал:
>>>>>
>>>>> Let's elaborate on this task proposed by Yu in this thread.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "iCQA" group.
>>>>> To post to this group, send email to ic...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> icqa+uns...@googlegroups.com.
>>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msg/icqa/-/AIFLvpJY5bkJ.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "iCQA" group.
>>> To post to this group, send email to ic...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> icqa+uns...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msg/icqa/-/eDhsDkCqzS0J.
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "iCQA" group.
>> To post to this group, send email to ic...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> icqa+uns...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hello,