trouble running aligner with NA12878 chr1 fastq

41 views
Skip to first unread message

Steve Cook

unread,
Jul 7, 2014, 7:44:29 PM7/7/14
to gotc...@googlegroups.com
Hello.

I am trying to run the gotcloud aligner against the NA12878 chr1 data available from 

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/working/20101201_cg_NA12878/NA12878.ga2.exome.maq.raw.bam


I used picard to extract two fastq files. 


I end up with error "read <> mapped to 'chr1' at POS 0 to -1, flag 69,0 has BIN 4680 but should be 266824".

Then "Fix it by using BAM->SAM->BAM to force a recalculation of the BIN field."

"Fail to index the BAM file"


I do however see .done files and I get a BAM file output. Does this mean that the BAM creation was successful and the index needs to be redone? I'm performing the BAM->SAM->BAM conversion and I'm trying to sort the BAM file and create the index file using samtools sort and index. I don't know if that will complete successfully or not but it's underway.  But why did I need to do these extra steps in the first place?


My configuration file is:

INDEX_FILE=align2.index
##########
# References
REF_ROOT=/mnt/workspace
#
AS=NA12878as
REF=$(REF_ROOT)/resourcedata/hg19/hg19.fa
INDEL_PREFIX=$(REF_ROOT)/resourcedata/
DBSNP_VCF= $(REF_ROOT)/resourcedata/dbsnp138/dbsnp_138.hg19.vcf.gz
HM3_VCF=$(REF_ROOT)/resourcedata/hapmap3/hapmap_3.3.hg19.vcf.gz
OMNI_VCF=$(REF_ROOT)/resourcedata/1000G_omni2.5.hg19.vcf.gz
BWA_THREADS= -t 2




and my index file is:

MERGE_NAME      FASTQ1  FASTQ2  RGID    SAMPLE  LIBRARY CENTER  PLATFORM
HG00096 data/NA12878ga2exome/NA12878_1.fastq    data/NA12878ga2exome/NA12878_2.fastq    SRR062634       HG00096 2845856850      WUGSC   ILL\
UMINA



Steve Cook

unread,
Jul 7, 2014, 7:45:11 PM7/7/14
to gotc...@googlegroups.com
Correction:  I am using the exome data, not Chr1. Sorry.

Mary Kate Wing

unread,
Jul 8, 2014, 12:38:02 PM7/8/14
to Steve Cook, gotcloud
Thanks for the information, Steve.
I'm looking into the problem.

What version of GotCloud are you running?

You are correct, you shouldn't have to convert/do anything additional to the bam files you are using.

Also, when you see: "Fix it by using BAM->SAM->BAM..." is that coming out of your GotCloud run or a tool you are running after it?
If it is a separate run of samtools, outside of GotCloud, which version of samtools are you using, and what command are you running?

Thanks,
Mary Kate Wing


--
You received this message because you are subscribed to the Google Groups "GotCloud" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gotcloud+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Steve Cook

unread,
Jul 8, 2014, 2:18:03 PM7/8/14
to gotc...@googlegroups.com, steve.c...@gmail.com
Hi Mary Kate,

I'm running 1.12.
That error message arises from just executing the gotcloud alignment pipeline along with a "failed index step".

I ran samtools sort and index on the bam file after doing the manual bam->sam->bam conversion but samtools depth still says that the file is unsorted.

Could there be a problem with the way I split the exome file using picard tools? I ran
 

export S=NA12878

nohup java -Xmx24g  -jar  $PICARD_HOME/SamToFastq.jar   INPUT=${S}.tobesplit.sam  FASTQ=${S}_1.fastq  SECOND_END_FASTQ=${S}_2.fastq  RE_REVERSE=true  INCLUDE_NON_PF_READS=true  2>${S}.split.log &

To create the fastq's.

Mary Kate Wing

unread,
Jul 15, 2014, 3:09:23 PM7/15/14
to Steve Cook, gotcloud
I just wanted to follow up with the Google Group to let you know the resolution in case someone else encounters similar issues.

The issue appeared to be due to the BWA reference files being generated on a different version of BWA than GotCloud was using.
Steve regenerated the BWA reference files using the version of BWA included with GotCloud and was able to successfully run.

He also mentioned that one of his "biggest difficulties was with the alignment index file parser crashing if the tabbing was wrong or if there was a newline at the end of the file."

I'll see if I can try to clean up the alignment index file parsing to ignore a trailing new line and to ignore consecutive tabs.
But until I implement that, be sure the alignment index has a single tab between each column and doesn't have a trailing new line.

Mary Kate
Reply all
Reply to author
Forward
0 new messages