Re: Error running dDocent: function gensub never defined

154 views
Skip to first unread message

Jon Puritz

unread,
Jul 16, 2015, 3:49:51 PM7/16/15
to Sophie D, ddo...@googlegroups.com
Hi Sophie,

It seems like there is a problem with part of the assembly.  Is there a file called “rainbow.fasta” in your working directory, and is there anything in it?  Also is there an rbasm.out file and is it empty?

Thanks,

Jon
On Jul 16, 2015, at 2:38 AM, Sophie D <sophie.del...@gmail.com> wrote:

Hi Jon,

We use the latest dDocent script 2.05.
We are running under Ubuntu 14.04.2 LTS, 64 bit. Linux version is Trusty tahr. We have Illumina HiSeq 2500, PE sequencing. We tested the pipeline with 144 and now 3 samples to run it faster.
Do you need any more information?

Thanks for your time,

Sophie

--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-15 17:07 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
Hi Sophie,

I think that you might be using an older version of the pipeline.  Could you try using 2.05?  Also could you tell me more about your analysis and system?  Are you using SE or PE reads?  How many samples?  What type of sequencing technology?  What version of Linux?

Thanks,

Jon
> On Jul 15, 2015, at 7:57 AM, sophie.del...@gmail.com wrote:
>
> Dear Jonathan,
>
> I am running dDocent on my data and getting this error:
> "awk: line 2: function gensub never defined
> awk: line 2: function gensub never defined
> /usr/local/bin/dDocent: line 580: * 5 / 4: syntax error: operand expected (error token is "* 5 / 4")"
>
> May you help me out to solve this issue?
>
> Best,
> Sophie



Jon Puritz, PhD
Postdoctoral Research Associate
Harte Research Institute
Texas A&M Corpus Christi
6300 Ocean Drive
Corpus Christi, TX 78412-5869

Webpage: http://staff.tamucc.edu/jpuritz

Email: 
jpu...@gmail.com 
jonatha...@tamucc.edu

Work: 361-825-3343
Cell: 401-338-8739

"The most valuable of all talents is that of never using two words when one will do."
-Thomas Jefferson

Sophie D

unread,
Jul 17, 2015, 7:50:38 AM7/17/15
to Jon Puritz, ddo...@googlegroups.com

Hi Jon,

I run process_radtags and dDocent scripts once more from scratch.

I still get an error for gensub function:

awk: line 2: function gensub never definedTrimmomaticSE: Started with arguments: -threads 10 -phred33 uniq.fq uniq.fq1 ILLUMINACLIP:/usr/local/bin/TruSeq2-PE.fa:2:30:10 MINLEN:186Using PrefixPair: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'Using Long Clipping Sequence: 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC'Using Long Clipping Sequence: 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA'Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT'Using Long Clipping Sequence: 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG'Using Long Clipping Sequence: 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'ILLUMINACLIP: Using 1 prefix pairs, 6 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequencesInput Reads: 0 Surviving: 0 (�%) Dropped: 0 (�%)TrimmomaticSE: Completed successfully================================================================Program: CD-HIT, V4.6 (+OpenMP), Jul 06 2015, 16:44:37Command: cd-hit-est -i uniq.F.fasta -o xxx -c 0.8 -T 0 -M 0 -g         1Started: Fri Jul 17 13:41:17 2015================================================================                            Output                              ----------------------------------------------------------------total number of CPUs in the system is 16Actual number of CPUs to be used: 16total seq: 0longest and shortest : 0 and 18446744073709551615Total letters: 0Sequences have been sortedApproximated minimal memory consumption:Sequence        : 0MBuffer          : 16 X 12M = 192MTable           : 2 X 16M = 33MMiscellaneous   : 4MTotal           : 230MTable limit with the given memory limit:Max number of representatives: 625000Max number of word counting entries: 7812500        0  finished          0  clustersApprixmated maximum memory consumption: 230Mwriting new databasewriting clustering informationprogram completed !

I will check the files you are mentioning.

Regards,

Sophie

Sophie D

unread,
Jul 17, 2015, 10:45:49 AM7/17/15
to Jon Puritz, ddo...@googlegroups.com
Hi again Jon,

I have checked rainbow.fasta.gz and rbasm.out.gz
They were both zipped and weighting 30-34 bytes but once gunzipped, they are both empty.

Trimmomatics seems to work now. What to do next for the gensub function?

Regards,

Sophie


--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Jon Puritz

unread,
Jul 17, 2015, 12:50:25 PM7/17/15
to Sophie D, ddo...@googlegroups.com
Hi Sophie,

I’m not sure what’s happening here.  Is the file uniq.full.fasta empty as well?  During the assembly, do the graphs printed for choosing the cutoff values have data in them?

Jon

Sophie D

unread,
Jul 17, 2015, 1:11:35 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
Hi Jon,

I'm still in the lab on the other side of the Atlantic so I can answer right now to your email, hourray!

1). uniq.full.fasta is NOT EMPTY! It contains Contig_1 to Contig_21625

2). During the assembly, the graphs HAVE data in them. I chose cut-off value of 3 for minimum depth coverage and 10 for the minimum number of individuals with unique reads.

So, to me, until the prompt command to the assembly graph, all seem to go smooth.


NB: I have to admit, there is one thing I changed in the script of Rename_for_dDocent into the Rename_for_dDocent_nauty version (I join it to this email) because otherwise dDocent does not run. dDocent did not recognize the structure of file names. I suppressed the 'sample_' in the mv command on line 19 and 20. My files are named like Population_individual, I gave the barcode file I join in this email.
Can you maybe explain me better what does the renaming script do?
However, if the script only changes the file names, the end product seem to be ok after the Renaming step. But it might change something I am unaware of within the files.


Does that help?

Thank you,

Sophie

PS: I add my colleague Jasmien in cc, she will also follow to the conversation as I will be away from the lab (but still working and still ansering your emails).


--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Rename_for_dDocent_input_file.txt
Rename_for_dDocent_nauty.sh

Jon Puritz

unread,
Jul 17, 2015, 1:24:03 PM7/17/15
to Sophie D, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
Hi everyone,

1.)  Are there also sequences in the uniq.full.fasta file? There should not just be names…

2.) That’s a good sign…

3.)  The renaming script is simply that it just changes the names of the files.  That shouldn’t be a problem.  

Ok.  If the uniq.full.fasta file does have sequences in it, then try this command for me:

mawk '!/>/' uniq.full.fasta  | mawk '(NR==1||length<shortest){shortest=length} END {print shortest}’

Cheers,

Jon


<Rename_for_dDocent_input_file.txt><Rename_for_dDocent_nauty.sh>

Sophie D

unread,
Jul 17, 2015, 1:41:23 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
1). Yes yes, there are sequences in the uniq.full.fasta file.

mawk '!/>/' uniq.full.fasta  | mawk '(NR==1||length<shortest){shortest=length} END {print shortest}'

answers: 248

Cheers,

Sophie


--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Jon Puritz

unread,
Jul 17, 2015, 1:44:32 PM7/17/15
to Sophie D, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
That’s correct.  What is the exact command you are using to run dDocent?

Sophie D

unread,
Jul 17, 2015, 1:47:41 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
To run dDocent I do 'dDocent' ...

Sophie

--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Jon Puritz

unread,
Jul 17, 2015, 1:50:12 PM7/17/15
to Sophie D, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
Try:

bash dDocent

It seems like your default BASH shell isn’t properly interpreting the script.  The previous command that you ran is the one that is failing when the pipeline is running, but it works just fine from your terminal.

Sophie D

unread,
Jul 17, 2015, 1:52:31 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
I will try with three individuals then. It should be running few minutes. I let you know after the assembly graphs.

Thanks,

Sophie

--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Sophie D

unread,
Jul 17, 2015, 2:07:08 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
Now, the second graph does not work:

cat: ufile: No such file or directory
rm: cannot remove ‘ufile’: No such file or directory
         line 0: warning: Skipping data file with no valid points

gnuplot> plot 'uniqseq.peri.data' with lines notitle
                                                    ^
         line 0: x range is invalid


but it used to run for my 144 individuals.




--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Jon Puritz

unread,
Jul 17, 2015, 2:09:40 PM7/17/15
to Sophie D, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
There’s a small problem with this version of the code and extremely small data sets.  Could you try on something like 10 individuals?

Sorry, I forgot I need to fix that…

Jon

Sophie D

unread,
Jul 17, 2015, 2:19:04 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
Ok, I started again with 10 ind.

Could you tell me a nice command to know the percentage of advance of a run? Should I execute it after the comman I want to run?

Do you know how long running dDocent on 10 ind should take?

Thanks,

Sophie

--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Sophie D

unread,
Jul 17, 2015, 2:24:22 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com, Jasmien Hillien Lab Leuven
All right. I still have the same error with "bash dDocent":


"awk: line 2: function gensub never defined
TrimmomaticSE: Started with arguments: -threads 16 -phred33 uniq.fq uniq.fq1 ILLUMINACLIP:/usr/local/bin/TruSeq2-PE.fa:2:30:10 MINLEN:186

Using PrefixPair: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC'
Using Long Clipping Sequence: 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG'
Using Long Clipping Sequence: 'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 6 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 0 Surviving: 0 (�%) Dropped: 0 (�%)
TrimmomaticSE: Completed successfully
(...)"

I am wondering, why is it running, TrimmomaticsSE instead of PE...?


Regards,

Sophie



--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Jon Puritz

unread,
Jul 17, 2015, 2:54:51 PM7/17/15
to ddo...@googlegroups.com, sophie.del...@gmail.com, jpu...@gmail.com, Jasmien...@bio.kuleuven.be
Turns out the problem here is that Ubuntu does not have GNU awk installed by default, and that this was causing the error.

The command:

sudo apt-get -f install gawk

Should fix the problem.
...

Jon Puritz

unread,
Jul 17, 2015, 2:56:04 PM7/17/15
to ddo...@googlegroups.com, jpu...@gmail.com, sophie.del...@gmail.com
Turns out the problem here is that Ubuntu does not have GNU awk installed by default, and that this was causing the error.

The command:

sudo apt-get -f install gawk

Should fix the problem.

Jon Puritz

unread,
Jul 17, 2015, 3:01:43 PM7/17/15
to Sophie D, ddo...@googlegroups.com
Not really.  The easiest way is to track file changes.  For trimming, look for the creation of *.R1.fq.gz files.  For read mapping, look for the creation of *-RG.bam files.  For SNP calling, there will be about ~1000 intermediate raw vcf files, so you can count how many have been created already by using:  ls raw.*.vcf | wc -l

Hope that helps,

Jon
On Jul 17, 2015, at 1:58 PM, Sophie D <sophie.del...@gmail.com> wrote:

I come back to a previous "tricks and tools" question: do you know a command to see the percentage of progress of a run?

Thanks

Sophie

--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:53 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
That should do it.  We lost the dDocent group in this email chain, so I will post a reply there.  Just disregard that one.  

Cheers,

Jon

On Jul 17, 2015, at 1:52 PM, Sophie D <sophie.del...@gmail.com> wrote:

 I did:


sudo apt-get -f install gawk
sudo apt-get autoremove

AND NOW IT WORKS !!!! Infinite ASCII file in front of me with all contig names + sequences !

I'll start dDocent again, tell me if that's good for you :)

Sophie


--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:48 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
Try

sudo apt-get -f install gawk

Then yes autoremove.
On Jul 17, 2015, at 1:45 PM, Sophie D <sophie.del...@gmail.com> wrote:

sudo apt-get -f install

returns:
Reading package lists... Done
Building dependency tree      
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  libtcl8.5 libtk8.5 linux-headers-3.13.0-32 linux-headers-3.13.0-32-generic
  linux-headers-3.13.0-48 linux-headers-3.13.0-48-generic
  linux-image-3.13.0-32-generic linux-image-3.13.0-48-generic
  linux-image-extra-3.13.0-32-generic linux-image-extra-3.13.0-48-generic
  tcl8.5 tk8.5
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  blast2
0 upgraded, 0 newly installed, 1 to remove and 56 not upgraded.
1 not fully installed or removed.
After this operation, 1.311 kB disk space will be freed.
Do you want to continue? [Y/n]

...yes?
Should I use autoremove too?

I'll try the awk on uniq.full.fasta again.


Thanks a lot really!

Sophie



--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:43 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
Ok, try that.

On Jul 17, 2015, at 1:42 PM, Sophie D <sophie.del...@gmail.com> wrote:

I might need to run apt-get -f install (see below)

Reading package lists... Done
Building dependency tree     
Reading state information... Done
You might want to run 'apt-get -f install' to correct these:
The following packages have unmet dependencies:
 blast2 : Depends: libncbi6 (< 6.1.20120620.1) but it is not installable
          Depends: libncbi6 (>= 6.1.20120620) but it is not installable
E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution).



--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:40 GMT+02:00 Sophie D <sophie.del...@gmail.com>:
ok done. and now? I rerun dDocent?

Dear Jon, I'm starting to get hungry... 20:40 here :)

--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:38 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
Actually,

sudo apt-get install gawk

On Jul 17, 2015, at 1:36 PM, Jon Puritz <jpu...@gmail.com> wrote:

Ok.  It seems like that awk statement only works if gawk is your default awk program.  Do you have admin privillages on your system?  If so,

sudo apt-get gawk


On Jul 17, 2015, at 1:28 PM, Sophie D <sophie.del...@gmail.com> wrote:

command: awk 'BEGIN {RS = ">" ; FS = "\n"} NR > 1 {print "@"$1"\n"$2"\n+""\n"gensub(/./, "I", "g", $2)}' uniq.full.fasta

answer: awk: line 2: function gensub never defined


--
Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium
Leuven University & Institute for Agricultural and Fisheries Research


2015-07-17 20:26 GMT+02:00 Jon Puritz <jpu...@gmail.com>:
Try running this:

awk 'BEGIN {RS = ">" ; FS = "\n"} NR > 1 {print "@"$1"\n"$2"\n+""\n"gensub(/./, "I", "g", $2)}’ uniq.full.fasta

Sophie D

unread,
Jul 17, 2015, 3:09:06 PM7/17/15
to Jon Puritz, ddo...@googlegroups.com
Do you have an idea why my sequences are of length 'longest and shortest : 119 and 117' ?

I'm still waiting for the final output.

Many thanks for gawk.

Sophie


--

Sophie Delerue-Ricard, PhD student in Marine Molecular Evolutionary Ecology, Belgium

Leuven University & Institute for Agricultural and Fisheries Research



Reply all
Reply to author
Forward
0 new messages