Hi Ben,
The provisional data set for each TCGA cancer type contains all data available from the Broad Firehose. The publication data sets reflect the data that were used for each of the publications. This is usually a subset of the provisional data, since manuscripts were often written before TCGA completed their goal of sequencing 500 tumors.
There can be differences between provisional and published data. For example, the mutation data in the publication usually underwent more QC, and false positives might have been removed or, in rare cases, false negatives added. RNA-Seq and copy-number values may also differ slightly, as different versions of analysis pipelines could have been used. And the clinical data for the publication is probably also of higher quality or may contain a few more data elements, sometimes derived from the genomic data (e.g., genomic subtypes).
I usually recommend to use the published data set - it also has the advantage that it is static and will not change in the future.
Niki.