I added the de.tudarmstadt.ukp.wikipedia.parser 1.1.0 jar and now using it like this:
page = wiki.getDiscussionPage(34984282);
MediaWikiParserFactory pf = new MediaWikiParserFactory(Language.english);
MediaWikiParser parser = pf.createParser();
ParsedPage pp = parser.parse(page.getText());
for(Section section : pp.getSections()) {
System.out.println(section.getTitle());
for(Paragraph para : section.getParagraphs()) {
System.out.println(para.getText());
}
System.out.println();
}
The output is then (e.g. articleID 34984282, topic "The climax of the video is")
Original:== The climax of the video is ==
when all the [[Dudley Do-Right]]s cheer for the reading of Obama's letter. The video shows the letter (18:35) with the following text highlighted, "[I have authorized a] small number of combat equipped U.S. forces to deploy to central Africa to provide assistance to regional forces that are working toward the removal of Joseph Kony from the battlefield." Jason Russell reads these words while a crowd, including a serious looking African lady (18:42), look on in hushed reverence before breaking into estatic celebration. The musical crescendo carries to the resolution, asking you and all your friends to join the quest to get more Western policy-makers to bring UAV and Green Beret justice to these savages.
A synopsis is not complete without a quick overview of the climax and resolution of a video. [[User:Luke 19 Verse 27|Luke 19 Verse 27]] ([[User talk:Luke 19 Verse 27|talk]]) 22:34, 12 April 2012 (UTC)
:While your personal opinion of the film is nice, it is not proper for an encyclopedic article. It is [[WP:OR|original research]]. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 00:11, 13 April 2012 (UTC)
::I am not saying the above paragraph should be in the article. The edit I made, and that Blake Burba improved, was no more OR than the sentences that proceed it in the synopsis. It is appropriate to spoil the plot of a half-hour video. Don't you want to include information in this article?
::Please [[WP:Assume|assume good faith]] of me, my brother, as I assume good faith of you. I wish neither I nor you be made an ass, but neither you or me. So let's not assume anything, other than things from the [[Kony 2012|subject of this article]] obviously have [[WP:verify|verifilibililitty!]] [[User:Luke 19 Verse 27|Luke 19 Verse 27]] ([[User talk:Luke 19 Verse 27|talk]]) 05:36, 13 April 2012 (UTC)
:::Then you need a [[WP:RS|reliable source]] for the addition. I was also, if you noticed, removing the son sentence because it was unsourced, though it is sourced now.
:::Furthermore, considering the wording of your earlier attempts, it seems quite clear that you're trying to emphasize a few short seconds of the film in order to make the film look negative. This is not proper. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 06:28, 13 April 2012 (UTC)
::::To be more clear.
A. Watch the video, that is the verifiability. It is information readily available on the internet.
B. Assume good faith. You don't know why I do what fore. Making such accusations makes it seem like ''you'' have the agenda. I'm trying to improve and expand the synopsis. I thought it was fine when others edited out words like "white" and "climax." Please try to reach consensus with me and others, in a circle like we did for the white climax, and we'll all get off together on a better article, more filling than what we had before, sharing our assumtions of good faith and love for each other, just like Jason would want it, love all over everyones faces.
C. Smile at me brother, I love you. Let's improve the article instead of deleting content with little or no justification. [[User:Luke 19 Verse 27|Luke 19 Verse 27]] ([[User talk:Luke 19 Verse 27|talk]]) 00:44, 14 April 2012 (UTC)
:The problem is that the film itself is a primary source. Normally, this wouldn't be an issue if you were using it for a quote of what someone said in the film, but using it to interpret what is happening in a scene is de facto [[WP:OR|original research]]. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 00:56, 14 April 2012 (UTC)
::So the current version is ok with you? It doesn't have any interpretation, just a synopsis. I don't want this situation to get [[WP:EDITWAR|furry]]. [[User:Luke 19 Verse 27|Luke 19 Verse 27]] ([[User talk:Luke 19 Verse 27|talk]]) 17:37, 14 April 2012 (UTC)
:::I reworded it a little and added a reference. <font color="silver">[[User:Silver seren|Silver]]</font><font color="blue">[[User talk:Silver seren|seren]]</font><sup>[[Special:Contributions/Silver seren|C]]</sup> 18:30, 14 April 2012 (UTC)
::::Good edit. Thanks for doing the source. [[User:Luke 19 Verse 27|Luke 19 Verse 27]] ([[User talk:Luke 19 Verse 27|talk]]) 19:20, 14 April 2012 (UTC)
Parsed:The climax of the video is
when all the Dudley Do-Rights cheer for the reading of Obama's letter. The video shows the letter (18:35) with the following text highlighted, "[I have authorized a] small number of combat equipped U.S. forces to deploy to central Africa to provide assistance to regional forces that are working toward the removal of Joseph Kony from the battlefield." Jason Russell reads these words while a crowd, including a serious looking African lady (18:42), look on in hushed reverence before breaking into estatic celebration. The musical crescendo carries to the resolution, asking you and all your friends to join the quest to get more Western policy-makers to bring UAV and Green Beret justice to these savages.
A synopsis is not complete without a quick overview of the climax and resolution of a video. Luke 19 Verse 27 (talk) 22:34, 12 April 2012 (UTC)
While your personal opinion of the film is nice, it is not proper for an encyclopedic article. It is original research. SilverserenC 00:11, 13 April 2012 (UTC)
:I am not saying the above paragraph should be in the article. The edit I made, and that Blake Burba improved, was no more OR than the sentences that proceed it in the synopsis. It is appropriate to spoil the plot of a half-hour video. Don't you want to include information in this article?
:Please assume good faith of me, my brother, as I assume good faith of you. I wish neither I nor you be made an ass, but neither you or me. So let's not assume anything, other than things from the subject of this article obviously have verifilibililitty! Luke 19 Verse 27 (talk) 05:36, 13 April 2012 (UTC)
::Then you need a reliable source for the addition. I was also, if you noticed, removing the son sentence because it was unsourced, though it is sourced now.
::Furthermore, considering the wording of your earlier attempts, it seems quite clear that you're trying to emphasize a few short seconds of the film in order to make the film look negative. This is not proper. SilverserenC 06:28, 13 April 2012 (UTC)
:::To be more clear.
A. Watch the video, that is the verifiability. It is information readily available on the internet.
B. Assume good faith. You don't know why I do what fore. Making such accusations makes it seem like you have the agenda. I'm trying to improve and expand the synopsis. I thought it was fine when others edited out words like "white" and "climax." Please try to reach consensus with me and others, in a circle like we did for the white climax, and we'll all get off together on a better article, more filling than what we had before, sharing our assumtions of good faith and love for each other, just like Jason would want it, love all over everyones faces.
C. Smile at me brother, I love you. Let's improve the article instead of deleting content with little or no justification. Luke 19 Verse 27 (talk) 00:44, 14 April 2012 (UTC)
The problem is that the film itself is a primary source. Normally, this wouldn't be an issue if you were using it for a quote of what someone said in the film, but using it to interpret what is happening in a scene is de facto original research. SilverserenC 00:56, 14 April 2012 (UTC)
:So the current version is ok with you? It doesn't have any interpretation, just a synopsis. I don't want this situation to get furry. Luke 19 Verse 27 (talk) 17:37, 14 April 2012 (UTC)
::I reworded it a little and added a reference. SilverserenC 18:30, 14 April 2012 (UTC)
:::Good edit. Thanks for doing the source. Luke 19 Verse 27 (talk) 19:20, 14 April 2012 (UTC)
Well, it works better now. It doesn't deletes the comments and parses them in right plain text... but it deletes at beginning of each line one ":" (if there is one), which is also a strange behavior. Is this solvable?
But I can work with this, too, so thanks for the help!