Genotype_Edit.R

49 views
Skip to first unread message

Pedro Andrade

unread,
Oct 19, 2015, 10:43:33 AM10/19/15
to AftrRAD
Hi everyone,

I have been experimenting with AftrRAD this last week on a GBS dataset we have generated. I have just run into a problem on the Genotype.pl script that I can't seem to find a solution for. When I run it I get the following problem:

---------------------------------------
$ perl Genotype.pl

Arguments entered are...
pvalThresh      100
pvalLow 0.0001
MinReads        10
Help    0
pvalHigh        1e-05

Running Genotype.pl...

No such file or directory at Genotype.pl line 107.
---------------------------------------

Line 107 on the script is this: open GENOTYPERSCRIPTEDIT, ">RScripts/Genotype_Edit.R" or die$!;

Where can I find this RScript? I haven't found it in any of my directories or on the AftrRAD github page.

Thanks!
Pedro

Mike Sovic

unread,
Oct 19, 2015, 12:11:21 PM10/19/15
to AftrRAD
Hi Pedro,

That line (107) should actually be creating the Genotype_Edit.R file, so this file doesn't need to exist at the beginning of the run.  Therefore, you're correct that you won't find it in any of the downloads of the program.  What you should have at the beginning of the run (from the download) is the file "RScripts/Genotype.R" - this is used as a template for creating "Genotype_Edit.R".

Having said that, I actually have no idea at the moment why you'd be getting that error.  The only thing I would expect to cause that message is if you didn't have an RScripts folder in your working directory with the Genotype.R file, but in this case, I would have expected the error to come from line 106 as opposed to 107.  

Is there any chance you have moved any files/folders around since starting the run?  Maybe one way forward is to try to download the example dataset and do a run from the beginning (it should run in just a few minutes) to make sure it runs all the way through.  If you have a problem with this, then maybe a fresh download of the program would be the next thing to try on this example data.  If you're still getting this error, let us know.

Has anyone else run across this error message?

              Mike

Pedro Andrade

unread,
Oct 19, 2015, 12:21:02 PM10/19/15
to AftrRAD
Hi Mike,

Thanks for the input. While installing I got my RScripts folder elsewhere, so before this error I actually ran into the same problem at line 106. However I just edited the perl script with the correct path and it went ok (well, at least until getting the error on the following line).

To try to solve this I created a file in the RScripts folder using the following command: $ cat > Genotype_Edit.R         This created a new blank file. I edited all the paths in the Genotype perl and ran it again, and although it didn't report errors for line 107, some new ones surfaced, so I guess that was not the solution:

This is what happened in this new run:

------------------------------
$ perl Genotype.pl


Arguments entered are...
pvalThresh      100
pvalLow 0.0001
MinReads        10
Help    0
pvalHigh        1e-05


Running Genotype.pl...

Filehandle GENOTYPERSCRIPT opened only for output at Genotype.pl line 110.
Scoring genotypes at all nonparalogous loci.

Currently genotyping sample...Error in read.table("TempFiles/Genotypes.txt", header = TRUE, sep = "\t") :
  no lines available in input
Execution halted


Checking for individuals with large amounts of missing data (have >2 StDev more missing data than the average)
Illegal division by zero at Genotype.pl line 366.

Mike Sovic

unread,
Oct 20, 2015, 8:13:26 AM10/20/15
to AftrRAD
Hi Pedro,

Yeah, adding a blank file named "Genotypes_Edit.R" is probably not going to solve your problem.  Is there a reason you're not just using the RScripts folder as it is from the download?  Not sure I'm completely understanding what you mean that you got the RScripts folder elsewhere - do you mean that you downloaded it from somewhere else, or that you have it located somewhere outside of your working directory?  If the latter is true, I'm sure this is the issue.  In this case, you'd have to update paths in the script everywhere it calls R (probably more than just lines 106 and 107), and then it might work, though I'd have to look closely at this - not really sure whether that alone would be sufficient to make it work or not.

My suggestion at this point is to just get a fresh version of this RScripts folder, paste it in to your working directory (the same directory that contains Genotypes.pl), and I think then the Genotypes.pl script should run.  If for some reason this approach won't work for you (i.e. you need to store the RScripts folder somewhere outside of the working directory), let us know, and we can maybe talk about options at that point.

             Mike

Pedro Andrade

unread,
Oct 21, 2015, 6:24:07 AM10/21/15
to AftrRAD
Hi Mike,

I don't exactly remember why my RScripts folder landed somewhere else, probably was during the installation. Before your reply I was updating the paths in the perl script but it didn't really solve the Genotype_Edit.R problem. I now copied the RScripts directory to the main aftrrad folder and the line 107 error is not ocurring anymore, so I guess that was indeed the issue!

In any case, another one just popped up, at line 366:

Scoring genotypes at all nonparalogous loci.

Currently genotyping sample...Error in read.table("TempFiles/Genotypes.txt", header = TRUE, sep = "\t") :
  no lines available in input
Execution halted


Checking for individuals with large amounts of missing data (have >2 StDev more missing data than the average)
Illegal division by zero at Genotype.pl line 366.

------------
366 > my $MeanCount = $TotalNACount/$NumOfIndividuals;


Mike Sovic

unread,
Oct 22, 2015, 8:34:37 AM10/22/15
to AftrRAD
Hi Pedro,

Yeah, make sure that when you start the run, your working directory contains all of the files/directories in Fig. 3 of the manual (i.e. the RScripts folder).  Having any of these in any other place will almost certainly cause problems.

In terms of this specific error, I guess the first thing to check is whether you have a series of files in the TempFiles folder that are named "ForBinomialTestX.txt", where the X represents each of your sample names, and that each of these files contain data (alleles and read counts).  If these files don't exist, or don't contain data, then the problem traces back to your run of the AftrRAD.pl script, and not the Genotypes.pl script.  Let me know what these files look like and we'll go from there if you can't figure it out.

             Mike

Pedro Andrade

unread,
Oct 22, 2015, 12:02:15 PM10/22/15
to AftrRAD
Indeed, those files are missing from the TempFiles folder. Only have "IndividualX.txt" and SortedIndividualX.txt" files, with the following structure (example for one specimen):

$ head IndividualPAA081^M.txt
AAAAAAAAAAAGTAACAATTGTATCACTAAGTTTATCTGTAACTTTGAGTTGACCTAGCCATCAAACAACTTTCATGAATACAGACAT
AAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCACGGTAGCCCAGATCTCTCTTCTGTTGAGCTATTTATCTACTGTGAGTTTAGG
AAAAAAAAATCACAAAATATTAACTGAATAGAAATGTCAAGCAGAAATAATGTTACCATGTACATTCTTAAAAATGACCTTCTGCTAG
AAAAAAAAATCACAAAATATTAACTGAATAGAAATGTCAAGCAGAAATAATGTTACCATGTACATTCTTAAAAATGATCTTCTGCTAG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
$ head SortedIndividualPAA081^M.txt
AAAAAAAAAAAGTAACAATTGTATCACTAAGTTTATCTGTAACTTTGAGTTGACCTAGCCATCAAACAACTTTCATGAATACAGACAT
AAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCACGGTAGCCCAGATCTCTCTTCTGTTGAGCTATTTATCTACTGTGAGTTTAGG
AAAAAAAAATCACAAAATATTAACTGAATAGAAATGTCAAGCAGAAATAATGTTACCATGTACATTCTTAAAAATGACCTTCTGCTAG
AAAAAAAAATCACAAAATATTAACTGAATAGAAATGTCAAGCAGAAATAATGTTACCATGTACATTCTTAAAAATGATCTTCTGCTAG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG
AAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAATGTAGATTTTGAGTCAGAATGCAAGATCG

Mike Sovic

unread,
Oct 22, 2015, 2:09:14 PM10/22/15
to AftrRAD
Pedro,

OK, so the problem is actually associated with the run of AftrRAD.pl.  Did you notice any errors/warnings during this run?  Genotypes.pl will definitely not run correctly until this is fixed.

The next file I'd check for is TempFiles/ErrorReadTest/ErrorTestOut.txt, and make sure it has data in it (a list of sequences) - these are the sequences that are used in assembling loci.  If this exists with data, then next check for TempFiles/FinalAlignments.txt, and again, make sure there is data in it.  Let me know about these and we'll go from there.

            Mike

Pedro Andrade

unread,
Oct 23, 2015, 6:27:01 AM10/23/15
to AftrRAD
I'll have to run the first script again then. I don't have any of the two files you mention in the directories. The FinalAlignments.txt doesn't exist in the TempFiles folder or anywhere I've looked, and in the ErrorReadTest directory the only file I have is AllUniquesForErrorTest.txt, with the following info:

[pandrade@illuminaserver ErrorReadTest]$ ls
AllUniquesForErrorTest.txt
[pandrade@illuminaserver ErrorReadTest]$ head AllUniquesForErrorTest.txt
AAAAAAAAAAAGTAACAATTGTATCACTAAGTTTATCTGTAACTTTGAGTTGACCTAGCCATCAAACAACTTTCATGAATACAGACAT
AAAAAAAAAACAGAGGCACCTAGAGCATATCCATGGTAGCCCAGATCTCTCTTCTGTTGAGCTATTTATCTACTGTGAGTTTAGGCAA
AAAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCATGGTAGCCCAGATCACTTTTCTGTTGAGCTATTTATCTACTGTGAGTTTAG
AAAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCATGGTAGCCCAGATCTCTCTTCTGTTGAGCTATTTATCTACTGTGAGTTTAG
AAAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCATGGTAGCCCAGATCTCTTTTCTGTTGAGCTATTTATCTACTGTGAGTTTAG
AAAAAAAAAACAGAGTTAGGCACCTAGAGCATATCCATGGTAGTCCAGATCTCTTTTCTGTTGAGCTATTTATCTACTGTGAGTTTAG
AAAAAAAAAACGTGATCCTGCCATGTGGAGAAAAGAAGGGAACAATCTTAGGAAATGAACACAAAAGAAATTTGAATGCTGTTGGAAT
AAAAAAAAAAGGGAAAAAGGGCTTTCATGGCAGTCTAACTAGTCAAGACATTATTCCTATTCCTTAAAAACATGCAAGATCGGAAGAG
AAAAAAAAAATGAAACTTTTTCTGTGCTCAGGTTACAGGAACACAAAAGTTAAACTACATTAACCTGTACTTTTGGAAACAATGCAAG
AAAAAAAAAATGTTTCTTAAACTATGGAAAAATACACATTTATCCATTGGTCCATCACAACGTAGATTTTGAGTCAGAATGCAAGATC

Mike Sovic

unread,
Oct 23, 2015, 6:46:51 AM10/23/15
to AftrRAD

Hi Pedro,

 

Before re-running…

1.)  make sure your working directory is set up as described in previous post

2.)  make sure that you have plenty of free space (memory) on your hard drive.  What you have for the AllUniquesForErrorTest.txt file looks good.  One of the next files it creates is a large file named "TempFiles/AllReadsAndDepths.txt".  If you were to run out of space on your hard drive, this would be one likely place for that to happen.  See my posts under the topic "Perl errors in AftrRAD.pl" on April 9 and 10 regarding disk space for some details.


         Mike

Pedro Andrade

unread,
Oct 26, 2015, 6:51:55 AM10/26/15
to AftrRAD
Hi Mike,

Thanks for the patience you're having. I re-ran the first script during the weekend on the server (there's plenty of memory, so that is not an issue), and indeed there was an issue. Temporary files for each individual were still created like the last time, so maybe this is what I missed. The error on this run of Aftrrad.pl was on line 1408:

1408: open VARIANCEWRITE, "TempFiles/ErrorReadTest/ErrorUpdate$Number.txt" or die$!;

Mike Sovic

unread,
Oct 27, 2015, 6:33:01 AM10/27/15
to AftrRAD
Hi Pedro,

No worries - hopefully we're getting closer to the solution.  A few more files to check for now (again, make sure they exist and have data)…

1.)  TempFiles/UniqueWithCountsIndividualX.txt - one for each sample, just like with the SortedIndividualX.txt files you said you have in the previous message.
2.)  TempFiles/AllUniquesSorted.txt
3.)  Go to TempFiles/ErrorReadTest, and just send the names of all of the files that are in there (guessing there should just be 1 or 2 at this point).

Also, it might be helpful if you could attach any barcode files you're using, and the files named "ReportX", which are in Output/RunInfo.

            Mike

 

Pedro Andrade

unread,
Oct 27, 2015, 7:37:26 AM10/27/15
to AftrRAD
Hi Mike,

Of the files you asked, UniqueWithCountsIndividualX.txt files do not exist in the TempFolder, only IndividualX.txt and SortedIndividualX.txt. Of the two others, I have AllUniquesSorted.txt and AllUniquesForErrorTest.txt (this one is the only file in the ErrorReadTest folder). Their structure is as follows:

[pandrade@illuminaserver TempFiles]$ head AllUniquesSorted.txt

A
AA
AAA
AAAA
AAAAA
AAAAAA
AAAAAAAA
AAAAAAAAA
AAAAAAAAAAAGTAACAATTGTATCACTAAGTTTATCTGTAACTTTGAGTTGACCTAGCCATCAAACAACTTTCATGAATACAGACAT



[pandrade@illuminaserver ErrorReadTest]$ head AllUniquesForErrorTest.txt

A
AA
AAA
AAAA
AAAAA
AAAAAA
AAAAAAAA
AAAAAAAAA
AAAAAAAAAAAGTAACAATTGTATCACTAAGTTTATCTGTAACTTTGAGTTGACCTAGCCATCAAACAACTTTCATGAATACAGACAT

Mike Sovic

unread,
Oct 27, 2015, 8:31:14 AM10/27/15
to AftrRAD
Hmm…Having the SortedIndividualX.txt files, but not having the UniqueWithCountsIndividualX.txt is kind of odd.  Once the program creates the SortedIndividualX.txt files (which it appears to have done correctly), the next step is to create the UniqueWithCountsIndividualsX.txt files - so it seems like at least one problem would have to be at this step.  The code for this is basically…

uniq -c SortedIndividualX.txt UniqueWithCountsIndividualX.txt

The 'uniq' command only prints one copy of each unique sequence in the input file (SortedIndividualX), and the '-c' keeps track of the number of times each unique read was in the input file.  Try moving to the TempFiles folder and running the above command for one or two of your SortedIndividualX files (of course, you'll have to replace the 'X's' with the appropriate sample name).  See if you can get any of these 'UniqueWithCountsIndividual' files.

Also, I don't think this is the cause of the problem above, but you seem to have sequences of varying lengths in the dataset ('A', 'AA', 'AAA', etc are each individual sequences).  I think there has to be at least one fastq data file that has these short reads contained in it.  This will cause problems - all sequences need to be the same length, at least for this current version of the program.  Did you maybe try to trim reads based on quality scores prior to running AftrRAD?  If so, this is not a good idea.

Let me know whether you have any luck with the uniq function in creating a UniqueWithCountsIndividualX.txt file(s).

                Mike  
Reply all
Reply to author
Forward
0 new messages