Let me add a bit more of an intuitive explanation if you should find yourself as I, with less exposure to probabilities.
You have a sample set S of all elements.
The chance that you can find an element with a combination of characteristics in this set S, is defined by the Joint Probability. We represent this by P(characteristic 1, characteristic 2, etc.... ) // comma separated, remember this is the chance you can pull this from the entire sample set S e.g. P(A,B).
The chance that you can find an element with a characteristic or set of characteristics given the set has been prefiltered, i.e. subset, to only include the elements that have a given characteristic is defined by the Conditional Probability. We represent this by P( Characteristic to find | characteristics that are prefiltered in the set S ) e.g. P( A | B ) // read chance of A given B has been already achieved, so you are only looking at the chance from this subset, not full S set.
The whole complication surrounds the fact that P(A,B) =/= P(A) * P(B) // because they can not be assumed to be independently occurring in the sample set, the presence of A affects the presence of B.
A = lightning
B = rain
You can not just say that the chance of both A and B is represented by multiplying the chances of them occurring in S alone, e.g. P( lightning ) * P( rain ). They occur together and not independently, so we need more involved mechanics to derive the missing information needed for letting our Agent make decisions about the environment. This is described by the Bayes Theorem and its surrounding rules.
If you are new to this topic, like me, I recommend a little outside study, the Khan Academy or I liked a book "Probabilities for Dummies" I got for my kindle, about $10.
Professor Thrun has put us on notice that this topic will be central to all forthcoming topics and is one of the most important topics in A.I. so the investment in time to get familiar will be well worth it.
Also when working out the probabilities it can be quite helpful to draw out the graph/tree, and see which results satisfy the goal of the problem or program.
Marginal Probabilities are concerned with just one characteristic only, where as the other kinds Joint and Conditional are two or more.
Hope that helps too. --David
P(A) + P(~A) = 1
P( A | B) * P(B) = P(A,B) = P(B,A) = P( B | A) * P(A)
Note: P( A | B ) =/= P(A) ; P( B | A) =/= P(B) // since they can not be assumed to be independent