Hi,
For a project we are trying to get all the metadata available for 11,285 RNA-seq samples from the TCGA project. Getting this metadata has been harder than we anticipated.
Using the GDC api we can download a json file that has 11,285 rows. However a collaborator pointed to us that some information is missing from this file that is available in Firebrowse (recurrence information on pancreas). Why is that, we don't know.
* What else would you use to try to match the GDC and FirebrowseR data? Note that with FirebrowseR we get different tables of information for each TCGA project. For example, bcr_followup_uuid is missing in one of the projects.
* How did Firebrowse get the recurrence information that is missing in GDC? (and maybe other variables) See the biochemical_recurrence field on pancreas (PRAD) for example.
* Is it possible to get a flat metadata table via Firebrowse? I could try to do it from the output I get from FirebrowseR but maybe you have it somewhere.
Thanks,
Leonardo