That's not quite true. A word is trained for each of its occurrences. For example if you have the context like the following, with 5 words on either side of the target word [meow]...
the black cat gave a [meow] louder than the white cat
...then the skip-gram training example "with input word 'cat', try to predict 'meow'" will be presented to the network twice. And conversely, when the 1st 'cat' is the target word, 'meow' will be one of the trained inputs, then when the 2nd 'cat' is the target word, 'meow' will again be a trained input for target 'cat'.
(Similarly, if using negative-sampling, more frequent-words will more often be used to create the synthetic negative examples. So if 'cat' appears more often than 'meow', when some other word like 'toaster' is encountered, a synthetic negative-example pairing like "with input word 'toaster', try NOT to predict 'cat'" will be used more often than a similar negative-example with the target word 'meow'.)
To do other sorts of weighting isn't supported by the existing code, but as with anything else, could be patched in with enough effort.
- Gordon