Hi Vivian,
you're lucky -- there's a method for that! Almost...
>>> t = Tree('(ROOT (S (NP (PRP They)) (VP (VBD said) (SBAR (S (NP (NNP John)) (VP (VBD married) (NP (NNP Marry) (JJ last) (NNS month))))))))')
>>> t.leaves()
['They', 'said', 'John', 'married', 'Marry', 'last', 'month']
>>> t.leaves()[2:5]
['John', 'married', 'Marry']
>>> pos = t.treeposition_spanning_leaves(2,5)
>>> pos
(0, 1, 1, 0)
>>> t[pos]
Tree('S', [Tree('NP', [Tree('NNP', ['John'])]), Tree('VP', [Tree('VBD', ['married']), Tree('NP', [Tree('NNP', ['Marry']), Tree('JJ', ['last']), Tree('NNS', ['month'])])])])
>>> t[pos].leaves()
['John', 'married', 'Marry', 'last', 'month']
So, what .treeposition_spanning_leaves() gives you is the smallest subtree that covers your substring. But it doesn't prune any branches from that subtree, and since "last month" is in that subtree, it's still there.
To prune away "last month", you can take one word at the time, like this:
>>> t.leaves()[4]
'Marry'
>>> t.leaves()[5]
'last'
>>> t.leaf_treeposition(4)
(0, 1, 1, 0, 1, 1, 0, 0)
>>> t.leaf_treeposition(5)
(0, 1, 1, 0, 1, 1, 1, 0)
Find the node position where these two differ, i.e., (0, 1, 1, 0, 1, 1, 1), and remove that node:
>>> print t
(ROOT
(S
(NP (PRP They))
(VP
(VBD said)
(SBAR
(S
(NP (NNP John))
(VP (VBD married) (NP (NNP Marry) (JJ last) (NNS Month))))))))
>>> del t[(0, 1, 1, 0, 1, 1, 1)]
>>> print t
(ROOT
(S
(NP (PRP They))
(VP
(VBD said)
(SBAR
(S
(NP (NNP John))
(VP (VBD married) (NP (NNP Marry) (NNS Month))))))))
Repeat until the tree is pruned. When doing this on the left side, the index of your substring decreases every time you remove a node. You can solve this by counting from the right instead. I.e.:
>>> t.leaves()[-5]
'said'
>>> t.leaves()[-4]
'John'
>>> t.leaf_treeposition(len(t.leaves())-5)
(0, 1, 0, 0)
>>> t[0, 1, 0, 0]
'said'
>>> t.leaf_treeposition(len(t.leaves())-4)
(0, 1, 1, 0, 0, 0, 0)
>>> t[0, 1, 1, 0, 0, 0, 0]
'John'
Best is probably to start with .treeposition_spanning_leaves() to get the minimal subtree and then start pruning that one. Also, note that NLTK trees are mutable, which means that when you remove nodes, the original tree is also affected. If you don't want that you should use t.copy(deep=True) before you start pruning.
Good luck!
/Peter