Data not available

376 views
Skip to first unread message

alexande...@fbb.msu.ru

unread,
Jan 8, 2017, 1:07:17 PM1/8/17
to Perturb-seq
Hello,
Thank you for your tremendous work.

I'm a student at Moscow State University studying bioinformatics and I'm doing my course project on differential expression. I intended to use your database as it's comprehensive and well-set investigation, but unfortunately I've recently found that the link you provided for data (https://portals.broadinstitute.org/singlecell) isn't leading anywhere (causes error 404).

Is it possible for you to provide some kind of mirror for this data?

Thank you!

Atray Dixit

unread,
Jan 8, 2017, 1:50:29 PM1/8/17
to Perturb-seq
Hi Alexander,

I can check on the portal availability next week, but our pilot data can be accessed in the data folder within the github repository here:

and the GEO link here:

Good luck,
Atray

alexande...@fbb.msu.ru

unread,
Jan 13, 2017, 9:57:01 AM1/13/17
to Perturb-seq
Thanks a lot, Atray,

Could I ask you a few more questions?
If I understood everything right, there's no matrix KO versus gene with expression information on this gene in final data you provide on GEO.
Final accession to work with are matrices "gene - cell (i. e. KO) - UMI". Could you please clarify what is the numeric information in UMI column?

Thank you!

Alexander

Atray Dixit

unread,
Jan 13, 2017, 3:53:20 PM1/13/17
to Perturb-seq
I'm not sure I understand the first question, are you asking if there is a matrix of KO versus cell to figure out which cell has which perturbation? If so, then the equivalent information is provided in dictionaries (files whose name include cbc_gbc_dict on GEO)

For the second question, the numeric information in the UMI column is indeed UMI counts. If you would prefer, I can add the full (non-sparse) expression matrices (genes x cells) to GEO, I just found the .mtx format to load faster into memory.

qian...@gmail.com

unread,
Jan 13, 2017, 10:07:10 PM1/13/17
to Perturb-seq
Hello, 

Thanks for the great and exciting work! I'm a student at Tongji University studying transcriptomics. I have one related question on data availability. For GEO, https://www.ncbi.nlm.nih.gov/sra?term=SRX2360556 and https://www.ncbi.nlm.nih.gov/sra?term=SRX2360555 shows that the layout are single-ended, which is written as paired-end in paper. Should we treat it with fastq-dump as single-end or pair end? 

Further, the paper's method section mentioned that the the BCL files is treated with bcl2fastq2, kentools, and linux command to get the fastq sequence related to GBC. If we can only get fastq, can we start with grep the constant GBC sequence and then 
cat *${inputbc}.txt j sort j uniq -c j sort -k1,1g j awk ‘{print $1’’yt’’$2’’yt’’$3’’-’’$4}’ . Can you please provide a bit more details? such as what dose the inputbc mean? Are all SRR on GEO needed to be concatenated to run through cellranger?


Atray Dixit

unread,
Jan 14, 2017, 9:13:46 PM1/14/17
to Perturb-seq
1. So actually the raw data files on GEO are bam files NOT fastq files (this was just done because the number of fastq files output by the 10x program by default is quite large)
Reads are single end in the sense that it is 3' RNA-seq but paired end in the sense the the other read contains cell barcode information

2. There are two kinds of sequencing data processed for this work. One type is 10X scRNA-seq and another type is from the enrichment part to read out the GBC/CBC pairings. The later is what you are referring to from our paper's method section.

To answer your question the data on GEO does NOT need to be processed in this way to run through cellranger, we provide the processed output from cell ranger as well as the .bam files containing the raw read information that cellranger outputs. If you'd like raw fastq files as well, I believe 10X is working on a solution to make it easier to post those to GEO.

If you want an example of how to process the GBC sequencing data from raw files. A description is provided here, and a specific ipython notebook example here.

Good luck!

qian...@gmail.com

unread,
Jan 16, 2017, 8:10:06 AM1/16/17
to Perturb-seq
Thanks so much, Atray. We've got the Bam and analysis notebook.

Best
Alvin

李威毅

unread,
Dec 7, 2021, 12:05:56 PM12/7/21
to Perturb-seq
Dear sir 
could you please  add the full (non-sparse) expression matrices (genes x cells) to GEO? I can't find the WT matrix.

Reply all
Reply to author
Forward
0 new messages