Hi Mehjabin,
The system in use for English is kind of complicated and was the product of a number of people who were on the project before I joined who did a lot of hand tweaking to try to produce low noise output. I might have a summary explanation sitting in my email outbox from years ago, but either way I'd have to do some digging through email or source code to try to capture the net effect of the pipeline.
While exploring the use of CPL with other sufficiently English-like languages (e.g. Spanish, Portuguese), we took a simplified approach of defining a list of POS tag regexps. This results in somewhat noisier output, but the CPL (and NELL) of today is less sensitive to noisy input than 10 years ago, and starting out will a more recall-heavy all-pairs extraction turns out to be useful. Then, if there are very common sorts of noise, special hand-tweaked filters can be added as needed.
So, for instance, two of the regexps for the Portuguese category matrix are:
L,/V /CN
L,/V /CN /PREP /CN
The first one means "take a sequence of nouns to the left (L) of a verb (V) followed by a common noun (CN)" and then that sequence of nouns would be the arg1 and the verb and common noun would be the pattern for a single (arg1, pattern) extraction. The next one looks for a sequence of nouns followed by a common noun, then a prepositional phrase, then another common noun. I believe we have it set to accept a run of up to 5 nouns in a row to count as an arg1 or arg2, but I'd have to double-check that.
There are about a hundred of these regexps all told, and I forget where they came from, but it turns out you can get most of the good patterns in an English-like language with about a half-dozen of the most common constructions, and then you can go back by hand and look at the output from the POS tagger to find sequences of POS tags that are commonly missed to expand your set of regexps, or if you find one of your regexps is too noisy then you can remove it and replace it with a set of more specific regexps. It's one of those things that just takes some time for somebody to sit down and fiddle with until the output looks pretty good.
Regexps for relations are similar. Here are two very general ones, again for Portuguese. (This tagset comes from the LX-Parser package btw):
/V /CJ
/V /DA /CN
First one logs for a run of noun phrases to be arg1, then a verb followed by a conjunction, and then another run of noun phrases to be arg2. The second one does the same thing, but this time the pattern should be a verb followed by a definite article followed by a common noun.
I'm not all that much of an NLP guy myself, and I'm drawing a blank on suggesting where to look, but there are studies out there that identify the most common constructions in various languages that might be helpful to use as a starting point if you don't want to just look at a bunch of POS-tagged sentences and try to eyeyball out a list of the 10 or 20 most common things you see, and then look at the output on another batch of sentences to see what was commonly missed and what picked up a lot of noise.