format for input data to QuEST

117 views
Skip to first unread message

Matthew McCormack

unread,
Mar 1, 2012, 5:34:35 PM3/1/12
to chi...@googlegroups.com
I usually align with bowtie and get an output that looks like this:
@HD    VN:1.0    SO:unsorted
@SQ    SN:Chr1    LN:30427671
@SQ    SN:Chr2    LN:19698289
@SQ    SN:Chr3    LN:23459830
@SQ    SN:Chr4    LN:18585056
@SQ    SN:Chr5    LN:26975502
@SQ    SN:ChrC    LN:154478
@SQ    SN:ChrM    LN:366924
@PG    ID:Bowtie    VN:0.12.5    CL:"./bowtie -t -q --solexa1.3-quals -S -m 1 --best -p 8 -y TAIR9_col /Array/bowtie/LeiL_6-20-11/s_2_sequence.txt /Array/bowtie/LeiL_6-20-11/s_2_6-20-11_m1_all45.sam"
HWI-ST366_0104:2:1101:1371:1925#0/1    4    *    0    0    *    *    0    0    NCCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTG    #############################################    XM:i:0
HWI-ST366_0104:2:1101:1097:1941#0/1    4    *    0    0    *    *    0    0    NTGGTTCATCGTTCGAGATTGAAGTGGAACGCACAGATACATTGT    #############################################    XM:i:1
HWI-ST366_0104:2:1101:1558:1927#0/1    0    Chr2    14786160    255    45M    *    0    0    NTTGATAGCTTCAACTGATTGGGCCTTTCCGTTAATTCGGGATGT    #############################################    XA:i:1    MD:Z:0G44    NM:i:1
HWI-ST366_0104:2:1101:1604:1948#0/1    16    Chr2    1781983    255    45M    *    0    0    TGATGGATGCTCACAGTAAGGAGATGATGAAACACATGGCTGAGN    #############################################    XA:i:1    MD:Z:44A0    NM:i:1
HWI-ST366_0104:2:1101:1496:1979#0/1    0    Chr1    20261838    255    45M    *    0    0    TGACCTCCCAACTCAGCCAGAGAATTACCTTCACCGTATCGGAAG    CCCFFFFFHHHHHJJJJJJJJJJIJJJJJJJIIJJJHIJJJJJJJ    XA:i:0    MD:Z:45    NM:i:0
HWI-ST366_0104:2:1101:1988:1953#0/1


I looked in the QuEST test data and it is a different format:
TTCTTTTTTTGTTCTTTTCTTTGGG 0 0 chr10:92377697 F
TCAGGTGTGGAACACCCCAGCCCCC 0 0 chr16:87216372 F
GACCTGAGCTCAA.A.CAAATCAG. 0 0 chr10:90132965 R
CTTCTCCCCCAAC.AACACACCTAC 0 0 chr8:488025 F
ATAACAATCACCA.A.AAACCCCC. 0 0 chr12:14839413 F
AAAACCACACAA.CACAACACCCAA 0 0 chr11:56352492 F


Is this a format specific to an aligner, or do you get this format from proscessing with Perl script or similar ? If a script is necessary, what do the two zeros between the read and the chr represent ?

Matthew



The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Anton

unread,
Mar 1, 2012, 6:46:08 PM3/1/12
to QuEST ChIP-seq group
Hi Matt, bowtie native and sam formats are supported. You can check
out QuEST options by running configuration script without any
parameters.
Cheers, Anton.

On Mar 1, 2:34 pm, Matthew McCormack
> contains patient information, please contact the Partners Compliance HelpLine athttp://www.partners.org/complianceline. If the e-mail was sent to you in error

Matthew McCormack

unread,
Mar 1, 2012, 6:48:22 PM3/1/12
to chi...@googlegroups.com
Thanks, Anton. I asked this question too quickly. I read the Quick
Start Guide more carefully and figured out what I was doing wroing.

Matthew

Matthew McCormack

unread,
Mar 1, 2012, 7:31:56 PM3/1/12
to chi...@googlegroups.com
I am getting the following error:
Expected +/- in the 2nd field, but found: 4

Is this because my bowtie alignment is in SAM format ?

Matthew

Anton

unread,
Mar 2, 2012, 1:40:07 AM3/2/12
to QuEST ChIP-seq group
Matt, did you try the sam flag for your files? Those look like sam
files to me.
Best, Anton

On Mar 1, 4:31 pm, Matthew McCormack

Matthew McCormack

unread,
Mar 2, 2012, 2:04:57 PM3/2/12
to chi...@googlegroups.com
Hi Anton,

    Thanks for your reply. I do not see the sam flag when I print flags with generate_QuEST_parameters.pl. However, I just did the bowtie alignment over again without using Sam output. Now, when I use these non-sam formatted files (treatment and control), I get 31 million reads matched.
aligments: 31.52 M, matched : 31.52 M, not in gt: 0.000 M, offending: 0.000 M
Your bowtie alignment file seems to be ok.
However, I get this for all seven chromosomes:
sorting hits
contig: Chr1

+ reads: 0
- reads: 0

+ reads after collapsing: 0
- reads after collapsing: 0

stacks collapsed: 0
reads in collapsed stacks: 0

 And this at the end:
After collapsing, there are 0 ChIP read alignments
No ChIP reads were found in your data. Please check your alignment files.


What is the difference between alignments and chip alignments ?

Matthew

Anton

unread,
Mar 2, 2012, 2:07:29 PM3/2/12
to QuEST ChIP-seq group
Are you using on older QuEST version? You can grab the latest 2.4
here:

http://www.stanford.edu/~valouev/QuEST/QuEST.html

Let us know if you are still having difficulties.
Best, Anton

On Mar 2, 11:04 am, Matthew McCormack

Matthew McCormack

unread,
Mar 2, 2012, 10:06:51 PM3/2/12
to chi...@googlegroups.com
This is what I am using: QuEST ChIP-Seq analysis pipeline version 2.405
Last modified on 10-29-2009

Matthew

Anton

unread,
Mar 6, 2012, 4:53:25 PM3/6/12
to QuEST ChIP-seq group
Ok, let me look into this. I am compelled however to stop supporting
bowtie native format altogether since everyone is using SAM format
these days. Does it work with the SAM format?
Anton.

On Mar 2, 7:06 pm, Matthew McCormack

ankit arora

unread,
Aug 14, 2013, 5:56:09 AM8/14/13
to chi...@googlegroups.com
Hi Anton,

I have used sam output file and i am also getting same results with 0 reads by giving follwing command:

./generate_QuEST_parameters.pl -sam_align_ChIP /ngs_data/projects/References/ChipSeq/ssl1HTZUV.sam -rp /ngs_data/projects/ITN_aDDRess/genome -ap /ngs_data/projects/References/ChipSeq/

Thanks. I look forward to hear back from you soon.

Regards
ankit

Anton

unread,
Aug 14, 2013, 11:11:42 AM8/14/13
to chi...@googlegroups.com
Hi Ankit.

If you want to use Bowtie, output as SAM. SAM files can be used without a problem.
Best, Anton
Message has been deleted

Anton

unread,
Aug 14, 2013, 1:49:43 PM8/14/13
to chi...@googlegroups.com
Hi Ankit,

that probably means the bowtie SAM format is broken. Use samtools view command (convert to BAM than back to SAM) to "correct" your alignments and try again with QuEST. Or, you can align with BWA to get a proper SAM. Also, Picard has CleanSam command that you can try. 


On Wednesday, August 14, 2013 9:00:15 AM UTC-7, ankit arora wrote:
Hi Anton,

Thanks a lot for replying. Yes I have worked with Sam Format output
file from Bowtie. But in that case also its showing 0 reads by using
command as mentiond below and producing directory without files in it.

/generate_QuEST_parameters.pl -sam_align_ChIP /ngs_data/projects/References/
ChipSeq/ssl1HTZUV.sam -rp /ngs_data/projects/ITN_aDDRess/genome -ap
/ngs_data/projects/References/ChipSeq/

Thanks

Regards
ankit
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "QuEST ChIP-seq group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chipseq+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

ankit arora

unread,
Aug 22, 2013, 3:55:50 AM8/22/13
to chi...@googlegroups.com
Dear Anton,

I have tried with CleanSam and after that also its giving 0 peaks. Is
there any other thing wrong with it??

Thanks a lot.

Regards
ankit

Anton

unread,
Aug 26, 2013, 12:07:07 PM8/26/13
to chi...@googlegroups.com
Does it accept your files though? QuEST prints the progress while parsing your input files and you can see whether the data is accepted or considered offending?
Reply all
Reply to author
Forward
0 new messages