iinvalid insert distance

Sinclair Cooper

unread,

Feb 18, 2013, 7:26:39 AM2/18/13

to hku-...@googlegroups.com

Hi,

I have been trying to used IDBA_UD to do some tests on some false data that I've generated using previously sequenced data.

I keep getting the error 'Invalid insert distance'. I've tried changing the insert distances for the reads that I'm generating as well as the standard deviation for the insert distances. What assumptions does IDBA make about insert distance?

This the kind of output I'm getting:

number of threads 8

long reads 0

extra reads 0

read_length 100

kmer 20

kmers 9258 9290

merge bubble 0

contigs: 54 n50: 437 max: 941 mean: 182 total length: 9867 n80: 237

aligned 13734 reads

confirmed bases: 7325 correct reads: 10354 bases: 0

distance mean 254.027 sd 196.944

seed contigs 20 local contigs 108

kmer 40

kmers 9680 9698

merge bubble 0

contigs: 33 n50: 937 max: 1040 mean: 327 total length: 10795 n80: 815

aligned 18451 reads

confirmed bases: 8787 correct reads: 16856 bases: 0

distance mean 212.975 sd 288.682

seed contigs 11 local contigs 66

kmer 60

kmers 9930 9937

merge bubble 0

contigs: 20 n50: 1038 max: 1079 mean: 554 total length: 11093 n80: 1004

aligned 20999 reads

confirmed bases: 9719 correct reads: 19184 bases: 0

distance mean 153.616 sd 364.841

invalid insert distance

kmer 80

kmers 10003 10005

merge bubble 0

contigs: 11 n50: 1082 max: 1102 mean: 984 total length: 10834 n80: 1047

aligned 21586 reads

confirmed bases: 10025 correct reads: 19425 bases: 0

distance mean 113.821 sd 402.032

invalid insert distance

kmer 100

kmers 9853 9734

merge bubble 0

contigs: 10 n50: 1082 max: 1102 mean: 1072 total length: 10724 n80: 1060

aligned 21586 reads

distance mean 113.821 sd 402.032

invalid insert distance

Segmentation fault (core dumped)

Any help would be greatly appreciated.

Incidentally, when I run IDBA on a real data set (illumina PE reads) it seem to work ok, but I'm trying to QC the assembly by using a false data set so that I can more accurately quantify copy number etc.

Thanks

Sinclair

Yu PENG

unread,

Feb 19, 2013, 2:00:32 PM2/19/13

to hku-...@googlegroups.com

Hi,

We assumed the paired-end library is forward and backward (->, <-). Please make sure your library format is correct.

Thanks,

Yu Peng

Sinclair

--
You received this message because you are subscribed to the Google Groups "hku-idba" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hku-idba+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sinclair Cooper

unread,

Feb 21, 2013, 11:36:30 AM2/21/13

to hku-...@googlegroups.com

Hi,

I'm quite sure my reads are in the correct orientation. I tried reverse complementing them just to test but still getting similar error messages.

Does IDBA have a lower limit on the size of data set it can use? (I am using a very small dataset to test it).

Thanks

Sinclair

Yu PENG

unread,

Feb 21, 2013, 12:35:59 PM2/21/13

to hku-...@googlegroups.com

Hi,

There is no such limitation. Did you try velvet and soap? If they works, I think IDBA should also work.

Thanks,

Yu Peng

Sinclair Cooper

unread,

Feb 21, 2013, 12:37:32 PM2/21/13

to hku-...@googlegroups.com

Hi,

I tried velvet and get good results.

S

rohita sinha

unread,

Feb 21, 2013, 12:40:02 PM2/21/13

to hku-...@googlegroups.com

Why don't you share a fraction of your reads or one sample.

We can try and it will be a learning for all of us. I used IDBA_UD and got good results

Rohita

Rohita Sinha, Ph.D.
Core for Applied Genomics and Ecology (CAGE)

University of Nebraska-Lincoln
Dept. Food Science & Tech. 330 FIC
Lincoln, NE 68583-0919

Tel: 402- 472-5575

Fax: 402-472-1693

Email: rohita.sinha@gmail.com

Sinclair Cooper

unread,

Feb 21, 2013, 12:54:23 PM2/21/13

to hku-...@googlegroups.com

That's a great idea. I have a small data set that would be good for that.

I appreciate your input as IDBA has also worked well for me with real illumina output, but for my simulated data does not seem to work...

I've attached some reads.

Thankyou!

sim_reads.zip

Sinclair Cooper

unread,

Feb 21, 2013, 1:07:32 PM2/21/13

to hku-...@googlegroups.com

Sorry for sending separate fq files, but I speculated that it may be the way I'm pre-processing that could be the problem.

Kevin Chen

unread,

Feb 22, 2013, 12:09:28 AM2/22/13

to hku-...@googlegroups.com

Actually, I have the same problem. I always turn on the pre_correction with idba-ud. On one of many sequences that I assembled with idba-ud, at k = 40, 60, 80, 100, it shows "invalid insert distance", and then core dump, while this same sequence gets result from other assembler, like velvet, ray, abyss, etc.

idba_ud -r read.fa -o output --num_threads 16 --pre_correction

number of threads 16

long reads 0

extra reads 0

read_length 101

kmer 60

kmers 9439265 9392915

merge bubble 487

contigs: 13600 n50: 495 max: 129925 mean: 293 total length: 3988167 n80: 191

aligned 1547939 reads

confirmed bases: 2773649 correct reads: 1229186 bases: 167168

kmer 20

kmers 8391898 8704445

merge bubble 1542

contigs: 129740 n50: 294 max: 12132 mean: 56 total length: 7278867 n80: 22

aligned 1164980 reads

confirmed bases: 2664809 correct reads: 1014972 bases: 1294

distance mean 238.293 sd 60.7358

seed contigs 5599 local contigs 259480

kmer 40

kmers 12257698 12403264

merge bubble 1579

contigs: 42557 n50: 483 max: 56663 mean: 172 total length: 7350052 n80: 79

aligned 1404858 reads

confirmed bases: 3627855 correct reads: 1182843 bases: 3945

distance mean 238.007 sd 511.259

invalid insert distance

kmer 60

kmers 11490515 11522912

merge bubble 478

contigs: 20127 n50: 683 max: 130857 mean: 337 total length: 6789597 n80: 271

aligned 1690005 reads

confirmed bases: 4021375 correct reads: 1259011 bases: 29693

distance mean 220.6 sd 5615.2

invalid insert distance

kmer 80

kmers 8123863 8083455

merge bubble 109

contigs: 8858 n50: 984 max: 130897 mean: 657 total length: 5821418 n80: 389

aligned 1685593 reads

confirmed bases: 4037684 correct reads: 1265135 bases: 20058

distance mean 208.114 sd 8162.33

invalid insert distance

kmer 100

kmers 5297804 5242226

merge bubble 45

contigs: 5840 n50: 1191 max: 130897 mean: 912 total length: 5327754 n80: 461

aligned 1741678 reads

distance mean 208.566 sd 7872.14

invalid insert distance

Email: rohita...@gmail.com

Kevin Chen

unread,

Feb 22, 2013, 10:18:50 AM2/22/13

to hku-...@googlegroups.com

Oh, it probably has nothing to do with the pre_correction.

I rerun the sample without --pre_correction, still has Segmentation fault.

24257 Segmentation fault (core dumped) idba_ud -r read.fa -o output2 --num_threads 16

number of threads 16

long reads 0

extra reads 0

read_length 101

kmer 20

kmers 8487703 8806327

merge bubble 2642

contigs: 131829 n50: 289 max: 12132 mean: 55 total length: 7327282 n80: 22

aligned 1158662 reads

confirmed bases: 2661707 correct reads: 1008503 bases: 114609

distance mean 238.338 sd 60.0754

seed contigs 5617 local contigs 263658

kmer 40

kmers 12387008 12535507

merge bubble 1788

contigs: 43050 n50: 480 max: 56663 mean: 171 total length: 7376898 n80: 79

aligned 1406097 reads

confirmed bases: 3625088 correct reads: 1182231 bases: 29921

distance mean 237.921 sd 465.821

seed contigs 7248 local contigs 86100

kmer 60

kmers 11771870 11808943

merge bubble 502

contigs: 21020 n50: 671 max: 57410 mean: 327 total length: 6886687 n80: 266

aligned 1649679 reads

confirmed bases: 4021802 correct reads: 1233773 bases: 30336

distance mean 234.072 sd 746.447

invalid insert distance

kmer 80

kmers 8268691 8227832

merge bubble 114

contigs: 9160 n50: 974 max: 130897 mean: 642 total length: 5885783 n80: 387

aligned 1690081 reads

confirmed bases: 4053068 correct reads: 1267961 bases: 25744

distance mean 203.673 sd 8129.93

invalid insert distance

kmer 100

kmers 5350595 5292609

merge bubble 36

contigs: 5876 n50: 1180 max: 130897 mean: 912 total length: 5363417 n80: 463

aligned 1734549 reads

distance mean 204.888 sd 7930.91

invalid insert distance

Sinclair Cooper

unread,

Mar 5, 2013, 9:29:23 AM3/5/13

to hku-...@googlegroups.com

Hi all,

Has anyone had any luck with the sample data I sent?

Thanks

Sinclair

Jonathan Ligo

unread,

Mar 6, 2013, 1:09:20 PM3/6/13

to hku-idba

How did you generate this data? (I'm getting a core dump as well, but
only reads generated from the included read simulator seem to work for
me - I've tried generating data from metasim with the empirical model
and setting paired end probability to 1, but it still will core
dump).

Did it produce any useful contig/etc. files?

> >>>>>>>>>> send an email to hku-idba+u...@googlegroups.com**.
> >>>>>>>>>> For more options, visithttps://groups.google.com/**
> >>>>>>>>>> groups/opt_out <https://groups.google.com/groups/opt_out>.

>
> >>>>>>>>> --
> >>>>>>>>> You received this message because
>

> ...
>
> read more »

Yu PENG

unread,

Mar 6, 2013, 2:50:47 PM3/6/13

to hku-...@googlegroups.com

Hi all,

If the insert distance is invalid. The scaffold step has a high chance to get a segment fault. But the contig generation should work, you can find the contig.fa or contig-maxk.fa as result. The method we used to estimate insert distance is aligning the reads to contigs and compute the distance between each two reads. I am not sure why it doesn't work on this data. Can you also send out the reference sequence for testing? How is the assembly quality? Can the contigs be mapped back to reference well? If you align the reads to reference, can you detect the insert distance well?

Thanks,

Yu Peng

Sinclair Cooper

unread,

Mar 7, 2013, 5:38:22 AM3/7/13

to hku-...@googlegroups.com

Hi everyone,

contig-100.fa is actually quite a good assembly for the test data, and the read counts match up with the 'reference', (passed contig-100 and the 'reference' sequences through dotter and they seemed to be a good match.)

I've attached the reference sequences. Some of the sequences have been multiplied to try and reflect the copy number variation in the system I'm attempting to sequence. In this small (relatively low complexity) data set the copy numbers are as follows:

copy# length

10 984

5 1005

1 1015

1 1013

1 992

1 1018

1 1004

1 982

1 969

1 1024

The script I used to generate the reads from this data set was from here: http://socwiki.wordpress.com/2011/05/11/simulateseq-pl-perl-script-for-simulating-ngss-sequence/

The commands I used were simulateSeq.pl --PE 100X --ISPE 300 --ISSD 10 --RL 100-100 --ReadsSD 0 --type stdfq --circle test_mini.fasta

I've attached a clc report for a larger data set I generated in the same way (it gave the same results in idba).

Many thanks for all of your input with this.

Sinclair

test_mini.fasta

CLC report (2).pdf

Jérémy Tournayre

unread,

Mar 14, 2013, 6:33:19 AM3/14/13

to hku-...@googlegroups.com

Hi,

Same problem for me.

I add some news :

-With the 1.1.0 version of idba_ud I get this error (I think that is correlated whith the sd : when the sd is greater than 400, I always get "Invalid insert distance").

-On the other side, with the 1.0.9 of idba_ud runs like clockwork.

Thanks,

Jérémy Tournayre

daniel aguirre

unread,

Jun 14, 2013, 3:52:47 AM6/14/13

to hku-...@googlegroups.com

Hi all,

I tried idba_ud with several simulated datasets and it worked fine in most instances. In one instance I got the core dumped issue several times until I got it right!

now with real datasets I´m getting the same problem as you guys, and it is clearly a maximum sd insert distance issue.

I tried the new version and modifying the script but still got the same error.

As Yu Peng says we can still use the contigs generated which are quite fine.

Does anyone know if we can 'build' the scaffolds from the contigs with any available software? for example we have our contigs and initial paired reads then if a read pair aligns against the extremes of two different contigs we make a scaffold out of them and pu tX number of Ns in the middle.

Does this make sense, how can I we do it?

thanks

Zhuofei Xu

unread,

Aug 29, 2013, 9:57:16 AM8/29/13

to hku-...@googlegroups.com

Hello All,

I met the same problems with 2 read datasets. One should be normal and the other is abnormal without the scaffold file named scaffold.fa. I noticed that both can generate contig file contig.fa. But there is a little different between them. The contig.fa in that normal one is complete with a newline character in the last line but the other one might be not complete without newline in the end.

So do you think such file contig.fa is complete or not?

Thanks a lot for your suggestions in advance!

Zhuofei

Ben Temperton

unread,

Sep 13, 2013, 12:19:08 PM9/13/13

to hku-...@googlegroups.com

I can confirm this with my data sets.

Running IDBA-UD v.1.1.0 and 1.1.1 results in the following and a core dump (and no scaffold.fa)

\number of threads 12

long reads 0

extra reads 0

read_length 101

kmer 60

kmers 387875467 383443362

merge bubble 5826

contigs: 2947859 n50: 61 max: 22905 mean: 66 total length: 196442066 n80: 60

aligned 1252490 reads

confirmed bases: 4574771 correct reads: 454019 bases: 178643

kmer 20

kmers 1384235686 1413690000

merge bubble 505452

contigs: 29757001 n50: 25 max: 4640 mean: 30 total length: 897762003 n80: 20

aligned 379930 reads

confirmed bases: 3178588 correct reads: 127877 bases: 12507

distance mean -nan sd -nan

invalid insert distance

kmer 40

kmers 1981905064 2040150779

merge bubble 43548

contigs: 22163387 n50: 41 max: 51167 mean: 44 total length: 985913133 n80: 40

aligned 2949052 reads

confirmed bases: 18777173 correct reads: 679201 bases: 94180

distance mean 7287 sd 0

seed contigs 20653 local contigs 44326774

kmer 60

kmers 1247105612 1231462707

merge bubble 5606

contigs: 3468326 n50: 62 max: 127764 mean: 72 total length: 251963043 n80: 60

aligned 3102081 reads

confirmed bases: 19290186 correct reads: 702455 bases: 11289

distance mean 7287 sd 0

seed contigs 18623 local contigs 6936652

kmer 80

kmers 178091417 171168814

merge bubble 1091

contigs: 343387 n50: 113 max: 136176 mean: 151 total length: 52131602 n80: 80

aligned 3235230 reads

confirmed bases: 18979983 correct reads: 707838 bases: 4459

distance mean 7287 sd 0

seed contigs 17003 local contigs 686774

kmer 100

kmers 25446130 24372291

merge bubble 337

contigs: 20961 n50: 1611 max: 136176 mean: 1126 total length: 23618799 n80: 787

aligned 0 reads

distance mean -nan sd -nan

invalid insert distance

Whereas, running v. 1.0.9 on the same data works fine:

number of threads 12

long reads 0

read_length 101

kmer 60

kmers 297307891 289329626

merge bubble 5826

contigs: 102120 n50: 143 max: 22905 mean: 129 total length: 13262928

aligned 1091346 reads

confirmed bases: 4245315 correct reads: 428113 bases: 164712

kmer 20

kmers 1357876996 1369398738

merge bubble 507973

contigs: 17664727 n50: 43 max: 4819 mean: 37 total length: 660146933

aligned 370774 reads

confirmed bases: 3112494 correct reads: 127264 bases: 12129

distance mean 406.22 sd 232.893

seed contigs 6579 local contigs 13256

kmer 40

kmers 507006951 497715340

merge bubble 45438

contigs: 705710 n50: 109 max: 51167 mean: 105 total length: 74667222

aligned 2057022 reads

confirmed bases: 15214356 correct reads: 605465 bases: 70798

seed contigs 19696 local contigs 44049

kmer 60

kmers 72071102 71471755

merge bubble 9758

contigs: 119204 n50: 828 max: 97175 mean: 294 total length: 35063797

aligned 2562735 reads

confirmed bases: 18329394 correct reads: 659078 bases: 24008

seed contigs 19334 local contigs 43896

kmer 80

kmers 34215782 34084594

merge bubble 5187

contigs: 37084 n50: 1477 max: 97957 mean: 751 total length: 27863642

aligned 2903968 reads

confirmed bases: 19569921 correct reads: 696816 bases: 19404

seed contigs 17429 local contigs 39533

kmer 100

kmers 27151707 27093734

merge bubble 3509

contigs: 23267 n50: 1728 max: 98103 mean: 1149 total length: 26749209

aligned 2969925 reads

expected coverage 0.126465

num edges 8773

contigs: 17469 n50: 3584 max: 226190 mean: 1403 total length: 24510954

Stephen Turner

unread,

Dec 5, 2013, 10:09:33 AM12/5/13

to hku-...@googlegroups.com, yp...@cs.hku.hk

I'd like to re-open this discussion. I am having the same invalid insert distance / segfault issues on a single-end fasta file that runs just fine in velvet. I attached the file (only 1000 reads). I'm using 1.1.0, but have also tried 1.1.1 and neither works.

Using the file attached:

idba_ud -r tmp.fa -o idba

output:

number of threads 12

long reads 0

extra reads 0

read_length 151

kmer 20

kmers 5401 5400

merge bubble 0

contigs: 1 n50: 5405 max: 5405 mean: 5405 total length: 5405 n80: 5405

aligned 937 reads

confirmed bases: 5273 correct reads: 903 bases: 62

distance mean 173.028 sd 2177.7

invalid insert distance

kmer 40

kmers 5366 5365

merge bubble 0

contigs: 1 n50: 5405 max: 5405 mean: 5405 total length: 5405 n80: 5405

aligned 937 reads

confirmed bases: 5273 correct reads: 903 bases: 0

distance mean 173.028 sd 2177.7

invalid insert distance

kmer 60

kmers 5346 5345

merge bubble 0

contigs: 1 n50: 5405 max: 5405 mean: 5405 total length: 5405 n80: 5405

aligned 937 reads

confirmed bases: 5273 correct reads: 903 bases: 0

distance mean 173.028 sd 2177.7

invalid insert distance

kmer 80

kmers 5326 5325

merge bubble 0

contigs: 1 n50: 5405 max: 5405 mean: 5405 total length: 5405 n80: 5405

aligned 937 reads

confirmed bases: 5273 correct reads: 903 bases: 0

distance mean 173.028 sd 2177.7

invalid insert distance

kmer 100

kmers 5306 5305

merge bubble 0

contigs: 1 n50: 5405 max: 5405 mean: 5405 total length: 5405 n80: 5405

aligned 937 reads

distance mean 173.028 sd 2177.7

invalid insert distance

Segmentation fault

Anyone have any thoughts? Many thanks.

Stephen

tmp.fa

Joe Anderson

unread,

Dec 5, 2013, 1:40:33 PM12/5/13

to hku-...@googlegroups.com, yp...@cs.hku.hk

I've received the exact same fault with Stephen's input file, using IDBA 1.1.0 on Ubuntu 13.10.

-Joe

Stephen Turner

unread,

Dec 5, 2013, 3:59:43 PM12/5/13

to hku-...@googlegroups.com, yp...@cs.hku.hk

I've just discovered that running the exact same commands on the same data but specifying --num_threads 1 eliminates the problem. So, segfault with multithreaded job, runs fine on a single thread. Any thoughts Yu?

Thanks,

Stephen

Stephen Turner

unread,

Dec 5, 2013, 4:03:04 PM12/5/13

to hku-...@googlegroups.com, yp...@cs.hku.hk

Even more strangely, I get a segfault with 12 cores, no segfault with 16 cores, segfault again at 24 cores, no segfault at 3 cores. Not sure what's going on here.

Thanks,

Stephen

Ben Temperton

unread,

Dec 11, 2013, 4:55:44 PM12/11/13

to hku-...@googlegroups.com, yp...@cs.hku.hk

I can also confirm this on real datasets and that it does not segfault at 40 cores.

--
You received this message because you are subscribed to a topic in the Google Groups "hku-idba" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hku-idba/RzTkrVTod8o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hku-idba+u...@googlegroups.com.

Em Seven

unread,

Jul 10, 2014, 8:52:53 AM7/10/14

to hku-...@googlegroups.com, yp...@cs.hku.hk

Bet this has been resolved since, but am surfing the group for a problem of my own, and thought I might mention these lines from the IDBA README file:
"IDBA-UD IDBA-Hybrid and IDBA-Tran require paired-end reads stored in single FastA file and a pair of reads is in consecutive two lines."

I am thus more suprised that it actually works for your SE data on some values of --num_threads...

Matthew

unread,

Dec 3, 2014, 5:44:41 PM12/3/14

to hku-...@googlegroups.com

IDBA_ud is still crashing for me and saying "invalid insert size". has this problem not been fixed yet?

Peter King

unread,

Dec 31, 2014, 2:29:25 PM12/31/14

to hku-...@googlegroups.com

I don't believe so. I'm using the "Current Release" listed on the IDBA website (v1.1.1), and I'm having the same problems. Also, the scripts in this release don't seem to have been modified since July 2013, which suggests to me that the current release is still buggy.

Reply all

Reply to author

Forward