Hi,
I am trying to import the datahub data into cBioPortal deployed via docker. Some of the imports fail, all because of errors related to data_mirna.txt where all lines are being skipped:
Reading data from: /study/datahub/public/pancan_pcawg_2020/data_mirna.txt
Recaching...
Finished recaching...
--> profile id: 154
--> profile name: miRNA expression (UQ normalized)
--> genetic alteration type: MRNA_EXPRESSION
--> total number of samples: 749
--> total number of data lines: 1864
--> records inserted into `sample_profile` table: 749
--> total number of data entries skipped (see table below): 1864
org.mskcc.cbio.portal.dao.DaoException: Something has gone wrong! I did not save any records to the database!
at org.mskcc.cbio.portal.scripts.ImportTabDelimData.importData(ImportTabDelimData.java:307)
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:125)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:150)
Indeed, our cbioportal database, does not contain any of the genes or gene aliases listed in problematic data_mirna.txt files. I am curious as why the number of miRNA genes has decreased so drastically in recent seedDB files?
katsap@machine:~$ cat seed-cbioportal_hg19_v2.1.0.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
894
katsap@machine:~$ cat seed-cbioportal_hg19_v2.4.0.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
894
katsap@machine:~$ cat seed-cbioportal_hg19_v2.7.2.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
882
katsap@machine:~$ cat seed-cbioportal_hg19_v2.7.3.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
882
katsap@machine:~$ cat seed-cbioportal_hg19_v2.12.8.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
6
katsap@machine:~$ cat seed-cbioportal_hg19_v2.12.12.sql | tr ',' "\n" | grep "hsa-mir" | wc -l
6
Is this expected?