Evaluation of the task of organizing questions

35 views
Skip to first unread message

Nikita Zhiltsov

unread,
Nov 12, 2012, 2:27:35 PM11/12/12
to ic...@googlegroups.com
Dear colleagues,

let me start the thread about evaluation of the syllabus population task. I propose the following design for that:
  1. This task looks like a ranking search problem (syllabus chapters as queries, and questions as documents), therefore, it's natural to exploit several conventional evaluation measures for that, e.g. P@10 averaged over the whole query set, or AUC (Area Under the ROC curve), which may be more adequate, since the result sets per each chapter may be imbalanced.
  2. We need to combine results from several approaches including a baseline (e.g. BM25F scoring function or language models) to form a pool of results for judging. The pool size may depend on the result set (usually it's of 50 results).
  3. We have to pick up several reasonable syllabus examples on different areas to demonstrate that our method would be able to work across the different topics. This might be different programming languages, in which we are most proficient, e.g. Java, C++, Python, R. Next, we should add a description per each chapter and sub-chapter (I would suggest using only two-level hierarchy) that must be helpful for judges during annotation.
  4. Before we outsource annotation to judges, we'll have to create our own golden standard of annotated questions. It's key for detection and filtering out of weak judges.
  5. Amazon Mechanical Turk (AMT) can be used to get judgements from workers (Are there any alternatives?). This task definitely requires relevant skills and experience. It's an open question if AMT has facilities to choose appropriate judges for our annotation task. Yet, I believe it would be very beneficial to distinguish judgements from different levels of workers (maybe after asking the judges about that - "How would you consider your level of expertise for _ language?"), from "basic", "medium" to "expert". 
  6. I think the easiest way to build a user interface for annotation is to create a plain categorization task page that includes the following info: chapter title, chapter description, question title, question body, best answer body. Recall that we're searching only over answered questions. 
  7. Three-scale judgements look reasonable: "Relevant and crucial information" (2), "Relevant" (1), "Irrelevant" (0). 
  8. Gather and refine judgements.
  9. Compute the evaluation measure values and check statistical significance of ranking. 
So, what do you think? Comments and questions are welcome.

Nikita Zhiltsov

unread,
Nov 12, 2012, 3:08:23 PM11/12/12
to ic...@googlegroups.com
Just discussed the features of AMT with Dmitry. He suggests other two alternatives to UI:
  1. As we have a URL for each question on StackOverflow, consider a plain query-document pair relevance assessment task. 
  2. Take advantage of "external AMT HITs" and build our own application (or insert the JavaScript code on the categorization task page) to display a targeted syllabus tree and a list of questions. Then, the judge would have to drag the relevant questions and drop them into the proper tree nodes. This process looks very understandable for judges.

Qiaoling Liu

unread,
Nov 14, 2012, 10:11:29 AM11/14/12
to iCQA
Cool. I just want to add two different dimensions of metrics:
1. the coverage of question difficulty for a syllabus topic/chapter.
2. the diversity of question difficulty

I think these are also important measurements as question relevance for our task.

--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/HH-UY0yciw0J.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nikita Zhiltsov

unread,
Nov 14, 2012, 12:15:39 PM11/14/12
to ic...@googlegroups.com
Qiaoling,

they look interesting, but how can we formalize those metrics?


On Wednesday, November 14, 2012 10:11:33 AM UTC-5, Qiaoling Liu wrote:
Cool. I just want to add two different dimensions of metrics:
1. the coverage of question difficulty for a syllabus topic/chapter.
2. the diversity of question difficulty

I think these are also important measurements as question relevance for our task.

Qiaoling Liu

unread,
Nov 14, 2012, 3:07:54 PM11/14/12
to iCQA
There are some metrics for coverage/diversity used in search engine results evaluation. I think we could borrow them for our purpose.

To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/EKKsnPwqMrMJ.

Nikita Zhiltsov

unread,
Nov 14, 2012, 6:01:13 PM11/14/12
to ic...@googlegroups.com
Azat,

could you please start considering feasibility of the second evaluation strategy (see below)? We need your opinion about external HITs on Amazon Mechanical Turk as well as possibility of having a page displaying a syllabus in a tree widget, drag-and-drop and so on. Creating an example HIT ("proof of concept") would be great. I may provide these references about using AMT:

Азат Хасаншин

unread,
Nov 20, 2012, 4:20:11 PM11/20/12
to iCQA
Hello,

I think it's quite possible to do this, the best way would be to host the questions on our own site, using the ExternalQuestion structure. It will display our page in a frame.

This way, we will have full control over the process. We can develop javascript application for displaying questions and then do POST request to amt server to store the answers.

Amazon has great API, so we can post HITs and obtain answers automatically.

Unfortunately, I can't make an example HIT, AMT available only for the USA residents.


--
You received this message because you are subscribed to the Google Groups "iCQA" group.
To post to this group, send email to ic...@googlegroups.com.
To unsubscribe from this group, send email to icqa+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/iPHNV7Tu2roJ.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Азат Хасаншин

Nikita Zhiltsov

unread,
Nov 21, 2012, 2:35:03 PM11/21/12
to ic...@googlegroups.com
Azat, 

thanks, it looks promising. Let's start working on the HIT. How and what kind of HIT should we create?  


On Tuesday, November 20, 2012 4:20:13 PM UTC-5, Азат Хасаншин wrote:
Hello,

I think it's quite possible to do this, the best way would be to host the questions on our own site, using the ExternalQuestion structure. It will display our page in a frame.

This way, we will have full control over the process. We can develop javascript application for displaying questions and then do POST request to amt server to store the answers.

Amazon has great API, so we can post HITs and obtain answers automatically.

Unfortunately, I can't make an example HIT, AMT available only for the USA residents.

Азат Хасаншин

unread,
Nov 30, 2012, 3:07:14 PM11/30/12
to iCQA
First of all, we should have amazon aws account. To create a HIT, we have different options:

1) Use command line interface 
2) Use rest api
  There are official libraries for Perl, .Net, Java, Ruby.
  We can use third-party libraries for other languages work directly with the rest api

I recommend reading the Getting Started guide. It shows us the example of creating basic QuestionForm HIT. As I mentioned earlier,
for our task we should use ExternalQuestion structure, like this:
<ExternalQuestion xmlns="[the ExternalQuestion schema URL]">
  <ExternalURL>http://tictactoe.amazon.com/gamesurvey.cgi?gameid=01523</ExternalURL>
  <FrameHeight>400</FrameHeight>
</ExternalQuestion>

Some useful documents:

Unfortunately, I can't test out creating HITs, because they require USA citizenship. I think I can start working on external page
interface for categorizing questions.


To view this discussion on the web visit https://groups.google.com/d/msg/icqa/-/VsPreuIODgAJ.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Азат Хасаншин

Азат Хасаншин

unread,
Dec 10, 2012, 8:23:02 AM12/10/12
to iCQA
Progress update. Got the requester sandbox credentials, right now I'm working on
question categorization page. Hope to finish it in a couple of days.
--
Азат Хасаншин

Nikita Zhiltsov

unread,
Dec 11, 2012, 10:09:05 PM12/11/12
to ic...@googlegroups.com
Ok, could you show an intermediate version of the page somehow?
Reply all
Reply to author
Forward
0 new messages