Hi,
I am an analyst at University of Chicago, in Dr Luis Barreiro lab. I am trying to follow the same analysis pipeline for the smart 3-seq that was sequenced in our lab. However, I am getting the following error while running the align_smart-3seq.sh script on my samples:
Could not retrieve index file for '/project2/lbarreiro/DATA/analysis/Joao_redo/alignment_2/output_Phix30-i5_S10_R1_001.bam_tmp'
Traceback (most recent call last):
File "/project2/lbarreiro/DATA/analysis/Joao/unzippedfiles/umi-dedup-master/dedup.py", line 50, in <module>
for alignment in dup_marker:
File "/project2/lbarreiro/DATA/analysis/Joao/unzippedfiles/umi-dedup-master/lib/markdup_sam.py", line 83, in __next__
return next(self.output_generator)
File "/project2/lbarreiro/DATA/analysis/Joao/unzippedfiles/umi-dedup-master/lib/markdup_sam.py", line 227, in get_marked_alignment
alignment = umi_data.set_umi(alignment, truncate = self.truncate_umi)
File "/project2/lbarreiro/DATA/analysis/Joao/unzippedfiles/umi-dedup-master/lib/umi_data.py", line 98, in set_umi
if umi is None: umi = parse_umi(alignment.query_name, truncate)
File "/project2/lbarreiro/DATA/analysis/Joao/unzippedfiles/umi-dedup-master/lib/umi_data.py", line 95, in parse_umi
raise RuntimeError('read name %s does not contain UMI in expected Casava/bcl2fastq format' % label)
RuntimeError: read name A00639:1070:H2FJ7DRX2:2:2228:30770:5149:ATGCA:GGGGG:ACTGG does not contain UMI in expected Casava/bcl2fastq format
It creates the alignment log but the bam and the bai files are empty.
This is what I did: After unzipping the fastq files , I ran the extract_umi.py script and then ran the umi_homopolymer.py script. This output was used as input for the align_smart-3seq.sh script.
I also tried the other way to run umi_homopolymer.py first and then extract_umi.py but that didn't work out and gave errors.
Can you please guide me on how to run the pipeline and if there is something that I am doing wrong? I really appreciate the help!!
Thanks!!