Time required to a Corset Run (Can I estimate it based on my dataset)

213 views
Skip to first unread message

patrickp...@gmail.com

unread,
Apr 30, 2019, 4:38:09 PM4/30/19
to corset-project
Hello everybody,
I'm running a Corset run with 394,000 contigs assembled with Trinity and aligned though SALMON.
Now I'm waiting for a command line (bellow) to run (two weeks until now)
../corset-1.08-linux64/corset -g 1,1,1,1,2,2,2,2,3,3,3,3 -n Sample1_cond1,Sample2_cond1,Sample3_cond1,Sample2_cond1,Sample1_cond2,Sample2_cond2,Sample3_cond2,Sample4_cond2,Sample1_cond3,Sample2_cond3,Sample3_cond3,Sample4_cond3 -i salmon_eq_classes *-Sample*/aux_info/eq_classes.txt

In the terminal I can see the output bellow:

Running Corset Version 1.08
Setting sample groups:1,1,1,1,2,2,2,2,3,3,3,3, 3 groups in total
Setting sample names to:Sample1_cond1,Sample2_cond1,Sample3_cond1,Sample2_cond1,Sample1_cond2,Sample2_cond2,Sample3_cond2,Sample4_cond2,Sample1_cond3,Sample2_cond3,Sample3_cond3,Sample4_cond3
Reading salmon eq_classes file : 1-Sample1_cond1.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 2007191 equivalence classes
11399352 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 1-Sample2_cond1.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1917935 equivalence classes
11937553 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 1-Sample3_cond1.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1649157 equivalence classes
7651604 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 1-Sample4_cond1.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1664897 equivalence classes
7462025 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 2-Sample1_cond2.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1058044 equivalence classes
5945878 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 2-Sample2_cond2.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1245108 equivalence classes
7650276 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 2-Sample3_cond2.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1285093 equivalence classes
8964142 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 2-Sample4_cond2.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1602654 equivalence classes
11590931 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 3-Sample1_cond3.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1105538 equivalence classes
10676243 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 3-Sample2_cond3.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1076962 equivalence classes
9865936 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 3-Sample3_cond3.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1628813 equivalence classes
7884862 reads counted, 0 reads filtered.
Reading salmon eq_classes file : 3-Sample4_cond3.out/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1279923 equivalence classes
11144676 reads counted, 0 reads filtered.
Done reading all files.
Start to cluster the reads
0 million compact reads read
0.1 million compact reads read
0.2 million compact reads read
0.3 million compact reads read
0.4 million compact reads read
0.5 million compact reads read
0.6 million compact reads read
0.7 million compact reads read
0.8 million compact reads read
0.9 million compact reads read
1 million compact reads read
1.1 million compact reads read
1.2 million compact reads read
1.3 million compact reads read
1.4 million compact reads read
1.5 million compact reads read
1.6 million compact reads read
1.7 million compact reads read
1.8 million compact reads read
1.9 million compact reads read
2 million compact reads read
2.1 million compact reads read
2.2 million compact reads read
2.3 million compact reads read
2.4 million compact reads read
2.5 million compact reads read
2.6 million compact reads read
2.7 million compact reads read
2.8 million compact reads read
2.9 million compact reads read
3 million compact reads read
3.1 million compact reads read
3.2 million compact reads read
3.3 million compact reads read
3.4 million compact reads read
3.5 million compact reads read
3.6 million compact reads read
3.7 million compact reads read
3.8 million compact reads read
3.9 million compact reads read
4 million compact reads read
4.1 million compact reads read
4.2 million compact reads read
4.3 million compact reads read
4.4 million compact reads read
4.5 million compact reads read
4.6 million compact reads read
4.7 million compact reads read
4.8 million compact reads read
4.9 million compact reads read
5 million compact reads read
5.1 million compact reads read
5.2 million compact reads read
5.3 million compact reads read
5.4 million compact reads read
5.5 million compact reads read
5.6 million compact reads read
5.7 million compact reads read
5.8 million compact reads read
5.9 million compact reads read
6 million compact reads read
6.1 million compact reads read
6.2 million compact reads read
6.3 million compact reads read
6.4 million compact reads read
6.5 million compact reads read
6.6 million compact reads read
6.7 million compact reads read
6.8 million compact reads read
6.9 million compact reads read
7 million compact reads read
7.1 million compact reads read
7.2 million compact reads read
7.3 million compact reads read
7.4 million compact reads read
7.5 million compact reads read
7.6 million compact reads read
7.7 million compact reads read
7.8 million compact reads read
7.9 million compact reads read
8 million compact reads read
8.1 million compact reads read
8.2 million compact reads read
8.3 million compact reads read
8.4 million compact reads read
8.5 million compact reads read
8.6 million compact reads read
8.7 million compact reads read
8.8 million compact reads read
8.9 million compact reads read
9 million compact reads read
9.1 million compact reads read
9.2 million compact reads read
9.3 million compact reads read
9.4 million compact reads read
9.5 million compact reads read
9.6 million compact reads read
9.7 million compact reads read
9.8 million compact reads read
9.9 million compact reads read
10 million compact reads read
10.1 million compact reads read
10.2 million compact reads read
10.3 million compact reads read
10.4 million compact reads read
10.5 million compact reads read
10.6 million compact reads read
10.7 million compact reads read
10.8 million compact reads read
10.9 million compact reads read
11 million compact reads read
11.1 million compact reads read
11.2 million compact reads read
11.3 million compact reads read
11.4 million compact reads read
11.5 million compact reads read
11.6 million compact reads read
11.7 million compact reads read
11.8 million compact reads read
11.9 million compact reads read
12 million compact reads read
12.1 million compact reads read
12.2 million compact reads read
12.3 million compact reads read
12.4 million compact reads read
12.5 million compact reads read
12.6 million compact reads read
12.7 million compact reads read
12.8 million compact reads read
12.9 million compact reads read
13 million compact reads read
13.1 million compact reads read
13.2 million compact reads read
13.3 million compact reads read
13.4 million compact reads read
13.5 million compact reads read
13.6 million compact reads read
13.7 million compact reads read
13.8 million compact reads read
13.9 million compact reads read
14 million compact reads read
14.1 million compact reads read
14.2 million compact reads read
14.3 million compact reads read
14.4 million compact reads read
14.5 million compact reads read
14.6 million compact reads read
14.7 million compact reads read
14.8 million compact reads read
14.9 million compact reads read
15 million compact reads read
15.1 million compact reads read
15.2 million compact reads read
15.3 million compact reads read
15.4 million compact reads read
15.5 million compact reads read
15.6 million compact reads read
15.7 million compact reads read
15.8 million compact reads read
15.9 million compact reads read
16 million compact reads read
16.1 million compact reads read
16.2 million compact reads read
16.3 million compact reads read
16.4 million compact reads read
16.5 million compact reads read
16.6 million compact reads read
16.7 million compact reads read
16.8 million compact reads read
16.9 million compact reads read
17 million compact reads read
17.1 million compact reads read
17.2 million compact reads read
17.3 million compact reads read
17.4 million compact reads read
17.5 million compact reads read
Starting hierarchial clustering...
0 thousand clusters done
1 thousand clusters done
2 thousand clusters done
3 thousand clusters done
4 thousand clusters done
5 thousand clusters done
6 thousand clusters done
7 thousand clusters done
8 thousand clusters done
9 thousand clusters done
cluster with 248247 transcripts.. this might take a while

------------------------

The command line above is still running until now, it are consuming 120gb (Maximum of My PC :P) + 30GB of SWAP memory
The command is using one of twelve cores of my PC
Finally my question is, there is any way to check if my Corset command line is still working or crashed?

Thank you in advance.

Nadia Davidson

unread,
May 6, 2019, 1:51:43 AM5/6/19
to corset-project
Hi,

I had hoped that corset 1.08 would run better on these sorts of large datasets, but it sounds like that might not be the case?
When I tested on a large dataset of mine which is similar to yours but has around 1/4 the number of equivalence classes you have, it runs fine even though it also runs into a cluster with about 250k transcripts. If your job is running successfully, you would normally see little progress updates like
"down to 248200 clusters. dist=0". If you are not getting these, I would say that it's not going to finish in any reasonable time and you can kill it.

What you describe sounds a lot like the behaviour of corset versions 1.07 and less. Just as sanity check to make sure I haven't mixed up the versions with the release somehow, can you tell me how and when did you download and installed corset?

A few tips to get corset running successfully are to clean the reads well, try using a different aligner or salmon with the --validateMapping and --hardFilter flags, and run corset with the -x and -l flags (set to something like 100 and 5 respectively). 

And of course if you provide me with a reproducible example, I'm very happy to investigate what's happening and hopefully improve corset.

Cheers,
Nadia.

patrickp...@gmail.com

unread,
May 6, 2019, 4:14:41 PM5/6/19
to corset-project
Hello Nadia, Thank for your reply
I'v downloaded three weeks ago from git hub releases page.
How can I send you my files?

Nadia Davidson

unread,
May 6, 2019, 6:36:38 PM5/6/19
to corset-project
Hi Partick,

You can upload your salmon eq_classes.txt files to google drive or dropbox or similar and send me the link. How big are they? No need to send all of them if a subset will reproduce the problem.

Did you download the binary of corset from gihub, or compile from source?

Cheers,
Nadia.

patrickp...@gmail.com

unread,
May 22, 2019, 11:14:59 AM5/22/19
to corset-project
Hi Nadia,
I've used the binaries.

I'm trying now run corset using same parameters in a computer with much more RAM, (256GB) and corset is consuming only 140GB, but after one week I can see the same behavior. Corset stoped at "cluster with 248247 transcripts.. this might take a while"

Bellow a link to access my eq_classes.txt files, in the GDrive folder you can see a file called "cmds-used", this is my command line. Please let me know when you already doewnload that files.

https://drive.google.com/drive/folders/1YQs7z6bF4ObHtYT7OL5dpQqOhgKeDx_q?usp=sharing

Thank you in advance

Nadia Davidson

unread,
May 24, 2019, 1:31:23 AM5/24/19
to corset-project
Thanks for that Patrick,

I've downloaded the files, and am able to reproduce the problem. I'll let you know when I find out what's going on.

Cheers,
Nadia.

patrickp...@gmail.com

unread,
May 28, 2019, 5:45:20 PM5/28/19
to corset-project
Thank you so much for your help.

Best,
Patrick.

Nadia Davidson

unread,
May 30, 2019, 9:01:47 PM5/30/19
to corset-project
Hi Patrick,

After investigating a bit it looks like corset is behaving as expected, but your dataset just has so many equivalence classes involving many transcripts, corset's compute time is too slow. Basically the equivalence classes are what tells corset that two or more transcripts have sequence in common and it is what is used for the clustering. 

To give an idea, I've got a similar dataset in terms of transcripts, number of reads and samples (also processed with Trinity and salmon) and I get about 1000X less "links" between transcripts. This means that in your dataset there are more transcripts with sequence in common than usually seen. I'm wondering what the cause of this could be? Did you perform multiple assemblies and merge the results? Or perhaps there is a lot of repeated sequence for some reason? It would be interesting to know so I've got a better idea of the use cases for corset.

In terms of getting corset to run, I've been able to do it successfully using the corset flag -l 10. This reduces the number of equivalence classes/links between transcripts. Unfortunately it also removes reads when it filtered, so your resulting library size is about 50-75% its original size. There's probably a way I can modify corset to redistribute the reads rather than remove them, and I'll aim to add this into the next release when I get a chance. If you don't like that solution, there are a few other things you could try:
- you could pre-filter the transcripts with CD-HIT  to remove redundancy (prior to running salmon)
- apply a repeat masker to the transcripts before running salmon (if non-coding repeats are the issue)
- run salmon with --validateMappings --hardFilter

Thanks again for reporting this. We can't improve our program without feedback on the sort of problems people are running into. Please feel free to update me on how/what ends up working for you!

Cheers,
Nadia.

patrickp...@gmail.com

unread,
May 31, 2019, 4:44:09 PM5/31/19
to corset-project
Hello Nadia,
bellow, Your questions and my answers

Did you perform multiple assemblies and merge the results?
A: No, I just used '--samples_file' parameter on Trinity to indicate all my sample files in FASTQ format, and Trinity generated the final FASTA file from all my samples.

Or perhaps there is a lot of repeated sequence for some reason?
A: I did not performed any analysis to search for repeated sequences, but I do not know any reason for this.

To run SALMON the option you sugested (--validateMappings --hardFilter) was already used in this eqclasses


The new version of corset with "reads redistribution" could be very good, I can wait for this, so just let me know about this future release.

In the current version of Corset, the Corset will finish the job even if I wait a lot of time??

Thank you so much for your help

Cheers
Patrick

Nadia Davidson

unread,
Jun 11, 2019, 4:17:21 PM6/11/19
to corset-project
Hi Patrick,

Sorry for the slow reply. I don't think corset will finish if you wait longer. 

I'll let you know when I make a new release.

Cheers,
Nadia.

Nadia Davidson

unread,
Jul 4, 2019, 8:37:04 PM7/4/19
to corset-project
Hi Patrick,

I've just made a new release 1.09 which redistributes reads rather than removing them when it "cuts" links between transcripts. Using "-l 10" on your dataset now completes after about a week. It will still look like it has stalled for a number of days, but should eventually finish. Let me know if you try it out and run into troubles.

Cheers,
Nadia.



patrickp...@gmail.com

unread,
Jul 5, 2019, 8:47:41 AM7/5/19
to corset-project
Hello,
I'm getting the following error

corset: error while loading shared libraries: liblzma.so.0: cannot open shared object file: No such file or directory

I found that maybe the package xz-utils is missing, but the error persists even after install xz-utils
Cheers

patrickp...@gmail.com

unread,
Jul 5, 2019, 8:54:21 AM7/5/19
to corset-project
version 1.08 run fine with no errors about liblzma

Nadia Davidson

unread,
Jul 6, 2019, 4:10:39 AM7/6/19
to corset-project
Sorry about that. I've just replaced the 1.09 tar ball with a binary that should be static and not depend on other libraries. Please try again and let me know if it works.

patrickp...@gmail.com

unread,
Jul 12, 2019, 9:56:22 AM7/12/19
to corset-project
Hello,
I've tried and the corset is running, but after the equivalence classes read the corset fish with the error bellow:

My command
$ corset -g 1,1,1,1,2,2,2,2,3,3,3,3 -n BCCP93_wintering,BCCP94_wintering,BCCP95_wintering,BCCP98_wintering,BCCP106_newly_arrived,BCCP109_newly_arrived,BCCP111_newly_arrived,BCCP114_newly_arrived,BCCP218_pre-migration,BCCP227_pre-migration,BCCP228_pre-migration,BCCP229_pre-migration -i salmon_eq_classes BCCP*/aux_info/eq_classes.txt -f true

Running Corset Version 1.09


Setting sample groups:1,1,1,1,2,2,2,2,3,3,3,3, 3 groups in total

Setting sample names to:BCCP93_wintering,BCCP94_wintering,BCCP95_wintering,BCCP98_wintering,BCCP106_newly_arrived,BCCP109_newly_arrived,BCCP111_newly_arrived,BCCP114_newly_arrived,BCCP218_pre-migration,BCCP227_pre-migration,BCCP228_pre-migration,BCCP229_pre-migration
Setting output files to be overridden
Reading salmon eq_classes file : BCCP106_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1058465 equivalence classes
6114025 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP109_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1257165 equivalence classes
7937394 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP111_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1264680 equivalence classes
9006841 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP114_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1579101 equivalence classes
11806538 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP218_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 997241 equivalence classes
9734088 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP227_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 907200 equivalence classes
8064490 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP228_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1566660 equivalence classes
7998081 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP229_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1192898 equivalence classes
10787561 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP93_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1970994 equivalence classes
11387494 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP94_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1907652 equivalence classes
12019479 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP95_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1644432 equivalence classes
8051556 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : BCCP98_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1609048 equivalence classes
7580461 reads counted, 0 reads filtered, 0 reads redistributed.


Done reading all files.
Start to cluster the reads

0 million equivalence classes read
....
16.9 million equivalence classes read


Starting hierarchial clustering...
0 thousand clusters done

cluster with 364954 transcripts.. this might take a while
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

Any way to fix?
Cheers
Patrick

Nadia Davidson

unread,
Jul 13, 2019, 9:30:54 PM7/13/19
to corset-project
Hi Patrick,

Is it possible you ran out of RAM? I think for your dataset you need to use the -l option with some reasonable threshold like 10 ("-l 10").
I tried this on the data you set me and it completed successfully. If you still get the "std::bad_alloc" after running with -l let me know so I can work out what's happened.

Cheers,
Nadia.

patrickp...@gmail.com

unread,
Jul 16, 2019, 7:25:46 AM7/16/19
to corset-project
I Nadia,
Using -l 10 the equivalence calasses reading step is taking a very long time (two days until now). In the screen I can see millions and millions of eq classes, take a look bellow
....
503.9 million equivalence classes read
504 million equivalence classes read
504.1 million equivalence classes read
504.2 million equivalence classes read
504.3 million equivalence classes read

This is expected?

Nadia Davidson

unread,
Jul 16, 2019, 7:46:01 AM7/16/19
to corset-project
I think this is fine. Let it run a bit longer. For me it took about a week to complete.

patrickp...@gmail.com

unread,
Aug 3, 2019, 5:01:58 PM8/3/19
to corset-project
Hello Nadia,
The Corset process is still running after 14 days, in the terminal screen I can see the following output....

Running Corset Version 1.08
Setting sample groups:1,1,1,1,2,2,2,2,3,3,3,3, 3 groups in total
Setting sample names to:BCCP93_wintering,BCCP94_wintering,BCCP95_wintering,BCCP98_wintering,BCCP106_newly_arrived,BCCP109_newly_arrived,BCCP111_newly_arrived,BCCP114_newly_arrived,BCCP218_pre-migration,BCCP227_pre-migration,BCCP228_pre-migration,BCCP229_pre-migration
Setting output files to be overridden
Setting minimum reads for a link to 10 (only used if -i is set to corset)
Reading salmon eq_classes file : BCCP106_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1058465 equivalence classes
4412496 reads counted, 1701529 reads filtered.
Reading salmon eq_classes file : BCCP109_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1257165 equivalence classes
5986283 reads counted, 1951111 reads filtered.
Reading salmon eq_classes file : BCCP111_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1264680 equivalence classes
6971452 reads counted, 2035389 reads filtered.
Reading salmon eq_classes file : BCCP114_newly_arrived/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1579101 equivalence classes
9061104 reads counted, 2745434 reads filtered.
Reading salmon eq_classes file : BCCP218_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 997241 equivalence classes
7860038 reads counted, 1874050 reads filtered.
Reading salmon eq_classes file : BCCP227_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 907200 equivalence classes
6386747 reads counted, 1677743 reads filtered.
Reading salmon eq_classes file : BCCP228_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1566660 equivalence classes
5582871 reads counted, 2415210 reads filtered.
Reading salmon eq_classes file : BCCP229_pre-migration/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1192898 equivalence classes
8551837 reads counted, 2235724 reads filtered.
Reading salmon eq_classes file : BCCP93_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1970994 equivalence classes
8122750 reads counted, 3264744 reads filtered.
Reading salmon eq_classes file : BCCP94_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1907652 equivalence classes
8490917 reads counted, 3528562 reads filtered.
Reading salmon eq_classes file : BCCP95_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1644432 equivalence classes
5516654 reads counted, 2534902 reads filtered.
Reading salmon eq_classes file : BCCP98_wintering/aux_info/eq_classes.txt
Reading data on 494826 transcripts in 1609048 equivalence classes
5014735 reads counted, 2565726 reads filtered.
Done reading all files. 
Start to cluster the reads
0 million compact reads read
0.1 million compact reads read
0.2 million compact reads read
0.3 million compact reads read
0.4 million compact reads read
0.5 million compact reads read
0.6 million compact reads read
0.7 million compact reads read
0.8 million compact reads read
0.9 million compact reads read
1 million compact reads read
1.1 million compact reads read
1.2 million compact reads read
Starting hierarchial clustering...
0 thousand clusters done
cluster with 243625 transcripts.. this might take a while

Do you think that is ok?
Cheers,

Nadia Davidson

unread,
Aug 13, 2019, 3:42:57 AM8/13/19
to corset-project
Hi Patrick,

Is this still running? Is this the same dataset that you sent me the files for earlier? If so, it should run to completion in about a week. If not I'm not sure what's going on, but I'm happy to send you the results I obtained from my testing. If you are using a different dataset, you may need to perform a bit of filtering prior to running salmon. A good option in your case could be to run CD-HIT-EST, which is likely to filter out very similar contigs.

Cheers,
Nadia.

patrickp...@gmail.com

unread,
Oct 5, 2019, 6:20:43 AM10/5/19
to corset-project
Hi Nadia,
Sorry for the delay in my answer, Can you please send me the data that you ran?

Cheers,
Patrick,

patrickp...@gmail.com

unread,
Dec 5, 2019, 6:03:50 AM12/5/19
to corset-project
Good Morning,
I'm trying run another analysis with less transcripts, however I'm having a similar issue, take a look bellow in the stdout from terminal.


corset -g 1,1,1,1,2,2,2,2 -n AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10 -i salmon_eq_classes ../EXP/*/aux_info/eq_classes.txt

Running Corset Version 1.09
Setting sample groups:1,1,1,1,2,2,2,2, 2 groups in total
Setting sample names to:AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10
Reading salmon eq_classes file : ../EXP/AE01/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 283339 equivalence classes
2570920 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AE10/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 146186 equivalence classes
799468 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AE13/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 331838 equivalence classes
4820095 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AE19/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 150085 equivalence classes
1093120 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AP04/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 193175 equivalence classes
3939675 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AP07/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 233963 equivalence classes
2635345 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AP08/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 269157 equivalence classes
2678829 reads counted, 0 reads filtered, 0 reads redistributed.
Reading salmon eq_classes file : ../EXP/AP10/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 316899 equivalence classes
5839017 reads counted, 0 reads filtered, 0 reads redistributed.

Done reading all files.
Start to cluster the reads
0 million equivalence classes read
0.1 million equivalence classes read
0.2 million equivalence classes read
0.3 million equivalence classes read
0.4 million equivalence classes read
0.5 million equivalence classes read
0.6 million equivalence classes read
0.7 million equivalence classes read
0.8 million equivalence classes read
0.9 million equivalence classes read
1 million equivalence classes read
1.1 million equivalence classes read
1.2 million equivalence classes read
1.3 million equivalence classes read
1.4 million equivalence classes read
1.5 million equivalence classes read
1.6 million equivalence classes read
1.7 million equivalence classes read
1.8 million equivalence classes read
1.9 million equivalence classes read

Starting hierarchial clustering...
0 thousand clusters done
cluster with 158370 transcripts.. this might take a while

This are running during 24H with no changes, do you think this will finish?

Nadia Davidson

unread,
Dec 5, 2019, 6:55:08 PM12/5/19
to corset-project
Hi Patrick,

Sorry not to reply to your earlier message. I will send you another message with the result of corset on the bigger dataset.

For the this smaller run, I think you should leave corset for a while. It may take a few days. What sort of system are you running it on?

Cheers,
Nadia.

patrickp...@gmail.com

unread,
Dec 6, 2019, 8:06:13 AM12/6/19
to corset-project
I'm running it on a 128GB RAM computer with 12 threads processor, Linux Ubuntu.
There is any way to use corset with multicore? 

patrickp...@gmail.com

unread,
Dec 23, 2019, 7:41:09 AM12/23/19
to corset-project
Hello, 
After four weeks the Corset finished with an error, take a look bellow:

$ corset -g 1,1,1,1,2,2,2,2 -n AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10 -i salmon_eq_classes ../EXP/*/aux_info/eq_classes.txt


down to
158200 clusters. dist=0
down to
158000 clusters. dist=0
down to
157800 clusters. dist=0
down to
157600 clusters. dist=0
down to
157400 clusters. dist=0
down to
157200 clusters. dist=0
down to
157000 clusters. dist=0
down to
156800 clusters. dist=0
down to
156600 clusters. dist=0
down to
156400 clusters. dist=0
down to
156200 clusters. dist=0
down to
156000 clusters. dist=0
down to
155800 clusters. dist=0
down to
155600 clusters. dist=0
down to
155400 clusters. dist=0
down to
155200 clusters. dist=0
down to
155000 clusters. dist=0
down to
154800 clusters. dist=0
down to
154600 clusters. dist=0
down to
154400 clusters. dist=0
down to
154200 clusters. dist=0
down to
154000 clusters. dist=0
down to
153800 clusters. dist=0
down to
153600 clusters. dist=0
down to
153400 clusters. dist=0
down to
153200 clusters. dist=0
down to
153000 clusters. dist=0
down to
152800 clusters. dist=0
down to
152600 clusters. dist=0
down to
152400 clusters. dist=0
down to
152200 clusters. dist=0
down to
152000 clusters. dist=0
down to
151800 clusters. dist=0
down to
151600 clusters. dist=0
down to
151400 clusters. dist=0
down to
151200 clusters. dist=0
down to
151000 clusters. dist=0
down to
150800 clusters. dist=0
down to
150600 clusters. dist=0
down to
150400 clusters. dist=0
down to
150200 clusters. dist=0
down to
150000 clusters. dist=0
down to
149800 clusters. dist=0
down to
149600 clusters. dist=0
down to
149400 clusters. dist=0
down to
149200 clusters. dist=0
down to
149000 clusters. dist=0
down to
148800 clusters. dist=0
down to
148600 clusters. dist=0
down to
148400 clusters. dist=0
down to
148200 clusters. dist=0
down to
148000 clusters. dist=0
down to
147800 clusters. dist=0
down to
147600 clusters. dist=0
down to
147400 clusters. dist=0
down to
147200 clusters. dist=0
down to
147000 clusters. dist=0
down to
146800 clusters. dist=0
down to
146600 clusters. dist=0
down to
146400 clusters. dist=0
down to
146200 clusters. dist=0
down to
146000 clusters. dist=0
down to
145800 clusters. dist=0
down to
145600 clusters. dist=0
down to
145400 clusters. dist=0
terminate called after throwing an instance of
'std::out_of_range'
  what
():  vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted (core dumped)

Any advice to fix this?

Nadia Davidson

unread,
Dec 23, 2019, 3:12:13 PM12/23/19
to corset-project
Hi Patrick,

If you send me a reproducible example, I'll have a look to see if this is a bug in corset.
If you add some filtering of the input data (-x and -l), you should be able to get to this to run in much faster time.

Cheers,
Nadia.

patrickp...@gmail.com

unread,
Dec 24, 2019, 12:36:02 PM12/24/19
to corset-project
Hello,
take a look in the link bellow to my eq_classes, the best scenarios is do not use filtering for me: Thank you 

My command line: corset -g 1,1,1,1,2,2,2,2 -n AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10 -i salmon_eq_classes ../EXP/*/aux_info/eq_classes.txt


Mary Christmas
Patrick, 

patrickp...@gmail.com

unread,
Dec 28, 2019, 9:06:33 AM12/28/19
to corset-project
Hello, same error even after add filtering, take a look bellow.

me@master ~/tambaqui-rna-seq/corset/cosert_res $ corset -g 1,1,1,1,2,2,2,2 -n AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10 -i salmon_eq_classes ../EXP/*/aux_info/eq_classes.txt -x 100 -l 5



Running Corset Version 1.09
Setting sample groups:1,1,1,1,2,2,2,2, 2 groups in total
Setting sample names to:AE01,AE10,AE13,AE19,AP04,AP07,AP08,AP10
Setting maximum alignments for a read to 100 (only used if -i is set to corset or salmon_eq_classes )
Setting minimum reads for a link to 5 (only used if -i is set to corset or salmon_eq_classes)

Reading salmon eq_classes file : ../EXP/AE01/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 283339 equivalence classes
2220076 reads counted, 47 reads filtered, 350797 reads redistributed.

Reading salmon eq_classes file : ../EXP/AE10/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 146186 equivalence classes
611986 reads counted, 0 reads filtered, 187482 reads redistributed.

Reading salmon eq_classes file : ../EXP/AE13/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 331838 equivalence classes
4420401 reads counted, 41 reads filtered, 399653 reads redistributed.

Reading salmon eq_classes file : ../EXP/AE19/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 150085 equivalence classes
903913 reads counted, 0 reads filtered, 189207 reads redistributed.

Reading salmon eq_classes file : ../EXP/AP04/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 193175 equivalence classes
3704418 reads counted, 272 reads filtered, 234985 reads redistributed.

Reading salmon eq_classes file : ../EXP/AP07/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 233963 equivalence classes
2339957 reads counted, 27 reads filtered, 295361 reads redistributed.

Reading salmon eq_classes file : ../EXP/AP08/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 269157 equivalence classes
2346469 reads counted, 216 reads filtered, 332144 reads redistributed.

Reading salmon eq_classes file : ../EXP/AP10/aux_info/eq_classes.txt
Reading data on 343091 transcripts in 316899 equivalence classes
5472190 reads counted, 96 reads filtered, 366731 reads redistributed.

Done reading all files.
Start to cluster the reads
0 million equivalence classes read
0.1 million equivalence classes read
0.2 million equivalence classes read
0.3 million equivalence classes read
0.4 million equivalence classes read
0.5 million equivalence classes read
0.6 million equivalence classes read
0.7 million equivalence classes read
0.8 million equivalence classes read
0.9 million equivalence classes read
1 million equivalence classes read
1.1 million equivalence classes read
1.2 million equivalence classes read
1.3 million equivalence classes read
1.4 million equivalence classes read
1.5 million equivalence classes read
1.6 million equivalence classes read
1.7 million equivalence classes read
1.8 million equivalence classes read
1.9 million equivalence classes read
2 million equivalence classes read
2.1 million equivalence classes read
2.2 million equivalence classes read
2.3 million equivalence classes read
2.4 million equivalence classes read
2.5 million equivalence classes read
2.6 million equivalence classes read
2.7 million equivalence classes read
2.8 million equivalence classes read
2.9 million equivalence classes read
3 million equivalence classes read
3.1 million equivalence classes read
3.2 million equivalence classes read
3.3 million equivalence classes read
3.4 million equivalence classes read
3.5 million equivalence classes read
3.6 million equivalence classes read
3.7 million equivalence classes read
3.8 million equivalence classes read
3.9 million equivalence classes read
4 million equivalence classes read
4.1 million equivalence classes read
4.2 million equivalence classes read
4.3 million equivalence classes read
4.4 million equivalence classes read
4.5 million equivalence classes read
4.6 million equivalence classes read
4.7 million equivalence classes read
4.8 million equivalence classes read
4.9 million equivalence classes read
5 million equivalence classes read
5.1 million equivalence classes read
5.2 million equivalence classes read
5.3 million equivalence classes read
5.4 million equivalence classes read
5.5 million equivalence classes read
5.6 million equivalence classes read
5.7 million equivalence classes read
5.8 million equivalence classes read
5.9 million equivalence classes read
6 million equivalence classes read
6.1 million equivalence classes read
6.2 million equivalence classes read
6.3 million equivalence classes read
6.4 million equivalence classes read

6.5 million equivalence classes read
6.6 million equivalence classes read
6.7 million equivalence classes read
6.8 million equivalence classes read
6.9 million equivalence classes read
7 million equivalence classes read
7.1 million equivalence classes read
7.2 million equivalence classes read
7.3 million equivalence classes read
7.4 million equivalence classes read
7.5 million equivalence classes read
7.6 million equivalence classes read
7.7 million equivalence classes read
7.8 million equivalence classes read
7.9 million equivalence classes read
8 million equivalence classes read
8.1 million equivalence classes read
8.2 million equivalence classes read
8.3 million equivalence classes read
8.4 million equivalence classes read
8.5 million equivalence classes read
8.6 million equivalence classes read
8.7 million equivalence classes read
8.8 million equivalence classes read
8.9 million equivalence classes read
9 million equivalence classes read
9.1 million equivalence classes read
9.2 million equivalence classes read
9.3 million equivalence classes read
9.4 million equivalence classes read
9.5 million equivalence classes read
9.6 million equivalence classes read
9.7 million equivalence classes read
9.8 million equivalence classes read
9.9 million equivalence classes read
10 million equivalence classes read
10.1 million equivalence classes read
10.2 million equivalence classes read
10.3 million equivalence classes read

Starting hierarchial clustering...
0 thousand clusters done
cluster with 119728 transcripts.. this might take a while
down to 119600 clusters. dist=0

down to 119400 clusters. dist=0
down to 119200 clusters. dist=0
down to 119000 clusters. dist=0
down to 118800 clusters. dist=0
down to 118600 clusters. dist=0

down to 118400 clusters. dist=0
down to 118200 clusters. dist=0
down to 118000 clusters. dist=0
down to 117800 clusters. dist=0
down to 117600 clusters. dist=0
down to 117400 clusters. dist=0
down to 117200 clusters. dist=0
down to 117000 clusters. dist=0
down to 116800 clusters. dist=0
down to 116600 clusters. dist=0
down to 116400 clusters. dist=0
down to 116200 clusters. dist=0
down to 116000 clusters. dist=0
down to 115800 clusters. dist=0
down to 115600 clusters. dist=0
down to 115400 clusters. dist=0
down to 115200 clusters. dist=0
down to 115000 clusters. dist=0
down to 114800 clusters. dist=0

Nadia Davidson

unread,
Jan 6, 2020, 10:21:58 PM1/6/20
to corset-project
Hi,

Sorry I didn't get around to downloading your example because of the Christmas holidays. If you upload the files again I'll have a look. 

Have you tried running it on other systems? Do you still get the same problem?

Cheers,
Nadia.

pereira.d...@gmail.com

unread,
Jan 7, 2020, 9:16:20 AM1/7/20
to corset-project
Hello,
I just sent to your email, thank you.

Cheers,

Nadia Davidson

unread,
Feb 4, 2020, 11:12:44 PM2/4/20
to corset-project
Hi Partick,

As usual, thanks for the reproducible example! The equivalence class data from salmon looks a bit different than I would expect. Can you post the full command you use for salmon. Are you using the --hardFilter and --validateMappings flags. If it's easy enough, can you realign your reads with salmon without the --validateMappings or --hardFilter flags and see how that goes? Feel free to forward me any eq_classes.txt files from the remap if the job is still running for very long periods. Which version of salmon are you currently using?

Cheers,
Nadia.

Patrick Pereira

unread,
Apr 1, 2020, 2:53:51 PM4/1/20
to corset-project
Hello Nadia, How are you? Hope you are doing very well :D
I'm trying run Corset again in a new data set (I sent you by email). Take  a look bellow my Salmon and Corset command line:

#!/bin/bash -x
#Run salmon
FILES
=`ls *.fastq | sed 's/.fqheads.trimmed.fastq//g'`
for F in $FILES ; do
        r
=${F}.fqheads.trimmed.fastq
                salmon quant
--index pusilla-wint-pre_mig-new_arriv --libType U --dumpEq -r $r --output quant/${F}.out -p 12
done
#run Corset
../bin/corset-1.09-linux64/corset -g 1,1,1,1,2,2,2,2,3,3,3,3 -n BCCP106_newly_arrived,BCCP109_newly_arrived,BCCP111_newly_arrived,BCCP114_newly_arrived,BCCP218_pre-migration,BCCP227_pre-migration,BCCP228_pre-migration,BCCP229_pre-migration,BCCP93_wintering,BCCP94_wintering,BCCP95_wintering,BCCP98_wintering -i salmon_eq_classes quant/BCCP*/aux_info/eq_classes.txt -f true -l 5 -x 100

I'm getting a similar behavior endless process...look bellow:

Running Corset Version 1.09
Setting sample groups:1,1,1,1,2,2,2,2,3,3,3,3, 3 groups in total
Setting sample names to:BCCP106_newly_arrived,BCCP109_newly_arrived,BCCP111_newly_arrived,BCCP114_newly_arrived,BCCP218_pre-migration,BCCP227_pre-migration,BCCP228_pre-migration,BCCP229_pre-migration,BCCP93_wintering,BCCP94_wintering,BCCP95_wintering,BCCP98_wintering
Setting output filename prefix to pusilla-wint-pre_mig-new_arriv_corset
Setting output files to be overridden
Setting minimum reads for a link to 5 (only used if -i is set to corset or salmon_eq_classes)
Setting maximum alignments for a read to 100 (only used if -i is set to corset or salmon_eq_classes )
Reading salmon eq_classes file : quant/BCCP106_newly_arrived.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 692577 equivalence classes
5165212 reads counted, 1095112 reads filtered, 834780 reads redistributed.
Reading salmon eq_classes file : quant/BCCP109_newly_arrived.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 831943 equivalence classes
6998061 reads counted, 1234515 reads filtered, 979560 reads redistributed.
Reading salmon eq_classes file : quant/BCCP111_newly_arrived.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 828159 equivalence classes
7023203 reads counted, 1958073 reads filtered, 985136 reads redistributed.
Reading salmon eq_classes file : quant/BCCP114_newly_arrived.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 889996 equivalence classes
10406689 reads counted, 1351616 reads filtered, 1048745 reads redistributed.
Reading salmon eq_classes file : quant/BCCP218_pre-migration.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 477609 equivalence classes
6237837 reads counted, 369211 reads filtered, 528428 reads redistributed.
Reading salmon eq_classes file : quant/BCCP227_pre-migration.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 434039 equivalence classes
4721033 reads counted, 369891 reads filtered, 483713 reads redistributed.
Reading salmon eq_classes file : quant/BCCP228_pre-migration.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 742040 equivalence classes
4920538 reads counted, 1571452 reads filtered, 881135 reads redistributed.
Reading salmon eq_classes file : quant/BCCP229_pre-migration.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 551995 equivalence classes
6790990 reads counted, 257916 reads filtered, 621896 reads redistributed.
Reading salmon eq_classes file : quant/BCCP93_wintering.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 1035233 equivalence classes
6447074 reads counted, 2057996 reads filtered, 1219845 reads redistributed.
Reading salmon eq_classes file : quant/BCCP94_wintering.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 1004152 equivalence classes
7914242 reads counted, 1934842 reads filtered, 1164910 reads redistributed.
Reading salmon eq_classes file : quant/BCCP95_wintering.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 886285 equivalence classes
7305812 reads counted, 2172351 reads filtered, 1068808 reads redistributed.
Reading salmon eq_classes file : quant/BCCP98_wintering.out/aux_info/eq_classes.txt
Reading data on 266414 transcripts in 842636 equivalence classes
4846758 reads counted, 1565139 reads filtered, 1009683 reads redistributed.
...
403.1 million equivalence classes read
403.2 million equivalence classes read
403.3 million equivalence classes read
403.4 million equivalence classes read
403.5 million equivalence classes read
403.6 million equivalence classes read
403.7 million equivalence classes read
403.8 million equivalence classes read
403.9 million equivalence classes read
404 million equivalence classes read
404.1 million equivalence classes read
404.2 million equivalence classes read
404.3 million equivalence classes read
404.4 million equivalence classes read
404.5 million equivalence classes read
404.6 million equivalence classes read
404.7 million equivalence classes read
404.8 million equivalence classes read
Starting hierarchial clustering...
0 thousand clusters done
1 thousand clusters done
2 thousand clusters done
3 thousand clusters done
4 thousand clusters done
5 thousand clusters done
6 thousand clusters done
7 thousand clusters done
8 thousand clusters done
9 thousand clusters done
10 thousand clusters done
11 thousand clusters done
12 thousand clusters done
13 thousand clusters done
14 thousand clusters done
15 thousand clusters done
16 thousand clusters done
17 thousand clusters done
18 thousand clusters done
19 thousand clusters done
cluster
with 103287 transcripts.. this might take a while


Look my top output

Screenshot_2020-04-01_15-39-38.png




My computer have 128GB RAM Intel core i7 4.0Ghz 12 threads processor 

Patrick Pereira

unread,
Apr 4, 2020, 10:35:29 PM4/4/20
to corset-project
UPDATE
Hello Nadia,
The cosset has finished successfully today, please ignore my message above haha.
Thank you
:D
Cheers
Patrick.
Reply all
Reply to author
Forward
0 new messages