--
You received this message because you are subscribed to the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
sort -k 1,1 -k 3,3 [A.sam] > [A.sort.sam]
samtools -view -bS [A.sort.sam] > [A.sort.bam]
The sorting actually moves the headers to the end of the file and therefore the sam cannot be converted to bam. Therefore after the sorting step i moved the headers back to the top of the file before I converted it to bam.
Am I doing something wrong?
Would you like to see my bam file?
samtools view -b -L A_nucgen.bed [output.sort.bam] > [A.bam]
samtools view -h -o [A.sam] [A.bam]
Where as this step sort -k 1,1 -k 3,3 [A.sam] > [A.sort.sam] creates a sorted sam file with all the headers at the end of the file, due to which I was not able to convert it to the A.sort.bam file as suggested in the next step.
Therefore I put the headers back on top and then converted it to a bam file which seems to give a empty bam typing.txt file.
I have used the latest manual created on Oct 3 2013.
These steps work fine:samtools view -b -L A_nucgen.bed [output.sort.bam] > [A.bam]
samtools view -h -o [A.sam] [A.bam]
Where as this step sort -k 1,1 -k 3,3 [A.sam] > [A.sort.sam] creates a sorted sam file with all the headers at the end of the file, due to which I was not able to convert it to the A.sort.bam file as suggested in the next step.
Dear Xiao,I tried the following steps to prepare BAM files for ATHLATES as mentioned by Smitha above and with your recommendations:samtools view -b -L A_nucgen.bed test_novoalign_sort.bam > test_a.bamsamtools view -h test_a.bam > test_a.samLC_ALL=C sort -k 1,1 -k 3,3 test_a.sam > test_a.sorted.samsamtools view -bS test_a.sorted.sam > test_a.sorted.bamHowever, I still get the warning:"[WARNING]: test_a.sorted.bam sort order: unsorted"
I tested the same procedure on the 1000 Genomes exomes for HG01756 and HG01757 which were used in your paper.And I was able to get the same HLA-A and HLA-B allelic pairs as in your paper.Is the warning really an issue (well, it seems not)? Can I simply ignore it?
I am actually using the first version of ATHLATES you released.
Thank you very much for the quick reply, Xiao!I still have a few more questions/concerns.
1) Now that you mentioned that using '-t 10' instead o '-t 30' yields better results, I am wondering whether I should redo all the read alignments. So far, I have used this command, based on older documentation and your paper.novoalign -d ref.nix -f rd1.fq rd2.fq -F STDFQ -o SAM -t 30 -r all -l 60 -e 1 | samtools view -bS -h -F 4 - > output.bam* Working with 75 bp reads. So, I set '-l 60'.The problem is that redoing all the alignments is not feasible resource-wise for me right now. So, is there a way that you would recommend around it to improve the results? For example, extract reads already aligned using '-t 30' and then realign using '-t 10'?
2) Sorting by only read name, i.e.samtools view -b -L A_nucgen.bed test_novoalign_sort.bam > test_a.bamsamtools sort -n test_a.bam test_a.sortedinstead of sorting by read name and then reference name (as in my previous post) resulted in different average coverage numbers ('Avg_cov' in output file), although the nominated alleles are largely identical.
Is this of concern? I don't quite get why the average coverage numbers differ.
Hi Yan,
I just checked the demo folder, there are no sam file in the folder only the following files.HG01756_a.sort.bamHG01756_b.sort.bamHG01756_c.sort.bamHG01756_na.sort.bamHG01756_nb.sort.bamHG01756_nc.sort.bamHG01756_nq.sort.bamHG01756_nr.sort.bamHG01756_q.sort.bamHG01756_r.sort.bamHG01756hlall.bamSo I am not sure where you get A.sam from ? If you follow section 4 the walk through step 2,then you should be able to apply typing to the demo files.
***********[samopen] SAM header is present: 9415 sequences.
[sam_read1] reference 'SN:HLA:HLA08800 AS:hlall.nix LN:270
gnMPI -d /home/cliu/HLA_SEQUENCES/hlall.nix -f HG01756_1.filt.fastq.gz HG01756_2.filt.fastq.gz -t 30 -o SAM -r all -l 80 -e 1 -i 230 140**********
' is recognized as '*'.
[main_samview] truncated file.
You suggested Shing to use "LC_ALL=C sort -k 1,1 -k 3,3 test_a.sam > test_a.sorted.sam" before command "samtools view -bS test_a.sorted.sam > test_a.sorted.bam". I did the same thing, but still got the same error.
Kate