Dear Nadia (or others),
I run Corset and obtained approximately the same number of clusters as trinity contigs. In short there was no reduction in the data and when I scan the clusters.txt file i only see clusters with super cluster numbers and no subclusters (eg Cluster. I am concerned that maybe I missed a crucial parameter of the alignment. Any advice?
The experimental design is as follows:
-Trinity assembly from 16 paired end timeseries samples (1/timepoint) yielded 215857 trinity 'genes' and 330722 transcripts.
-I mapped single end data back to this assembly for triplicates per timepoint (3/timepoint = 48 samples) using bowtie:
for file in *.fastq;
do
bowtie -n 2 -e 99999999 -m 200 \
--phred33-quals -S -p 4 \
Trinity.fasta \
"$file" > "$file.sam.2" 2>"$file.log.2"
samtools view -S -b "$file.sam.2" > "$file.bam.2";
done
Here's the samfile:
head 1-01.sam
@HD VN:1.0 SO:unsorted
@SQ SN:TRINITY_DN7_c0_g1_i1 LN:345
@SQ SN:TRINITY_DN7_c0_g2_i1 LN:240
@SQ SN:TRINITY_DN8_c0_g1_i1 LN:236
tail 1-01.sam
NS500449:235:HL5JJBGXX:4:23612:19920:20394 0 TRINITY_DN112585_c0_g3_i1 667 255 63M * 0 0 GAACACTCTAATTTTTTCAAAGTAAACGTCGCAAGTCCTCCGCACACTCAGCTAAGAGCACAC E/EE6EEEEEEEEEAEEEEEEEEAEAEEEEE/EEEEEEEEEEEAEEEE<EEEAEEEAEEE6E6 XA:i:0 MD:Z:63 NM:i:0
And here's the Corset call:
for FILE in `ls *.bam` ; do
./corset -r true-stop $FILE &
done
wait
/home/jwarner/Corset/Corset_code/corset-1.05-linux64/./corset -g 1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,\
9,9,9,10,10,10,11,11,1,12,12,12,13,13,13,14,14,14,15,15,15,16,16,16 \
-n R01A,R01B,R01C,R02A,R02B,R02C,R03A,R03B,R03C,\
R04A,R04B,R04C,R05A,R05B,R05C,R06A,R06B,R06C,\
R07A,R07B,R07C,R08A,R08B,R08C,R09A,R09B,R09C,\
R10A,R10B,R10C,R11A,R11B,R11C,R12A,R12B,R12C,\
R13A,R13B,R13C,R14A,R14B,R14C,R15A,R15B,R15C,\
R16A,R16B,R16C \
-i corset R*.corset-reads
head clusters.txt
TRINITY_DN91229_c0_g1_i1 Cluster-0.0
TRINITY_DN93817_c0_g1_i1 Cluster-1.0
TRINITY_DN73581_c0_g1_i1 Cluster-2.0
TRINITY_DN83669_c0_g1_i1 Cluster-3.0
Thanks in advance,
Jacob