We have released a new version of Goby. This new release mostly fixes a few problems identified since the last release.
New Features:
- Add an option to the fasta-to-compact mode that will convert a set of files and concatenate the result to a single compact-reads file (see new --concat option).
- Add a mode to test that the connection from Goby to R is working (requires JRI and R built with shared library support). The mode is called test-r-connection (tcr).
Bug fixes:
- Fix a bug that caused some slices to occur within annotations, despite the --annotation option being given on the command line. The problem was that the chromosome index was not /obtained from the genome and was set to zero, always. In rare cases, this would cause one annotation to be omitted from the output. Thanks go to Laurent Mesnard for reporting this problem.
- Restore STRICT_SOMATIC filter.
- Close files opened when loading Goby Alignment header and index files. This fixes a too many file error that could occur when loading hundreds of alignments simultaneously.
- Allow lenient import mode for TSV files. This makes it possible to convert TSV files to lucene.index when they have been created with Goby in the past with a \t character as last character of the column line.
I just tried Goby-2.3.4.1 to remove redundant sequences of a multi-fasta file (~1000 entries, some are 60kb in length). The program did the job in a flash, which is amazing!
There is one problem with my analysis, which is the original sequence IDs are changed to ascending numbers. This breaks the track of my analysis as the sequence ID are lost from here.
So my question:
Is there an option to keep the sequence ID untouched, so that I can keep track of my analysis like which sequences are filtered or kept?
Thanks a lot!
On Thursday, October 23, 2014 12:46:17 PM UTC-6, Fabien Campagne wrote:
> Hello,
>
>
> It may have been a bit too fast in this case. Be advised that this tool is designed to remove exact duplicate reads, which may not make much sense with long read lengths (because the probability of a base with error increases).
>
>
> Regarding your question, yes it should be possible to preserve read names, you can do this when you convert from fasta to compact-reads with the -x option (see http://campagnelab.org/software/goby/reference-documentation/modes/fasta-to-compact/, you can also access this doc from the main project page http://goby.campagnelab.org). Names should be preserved by filtering, let us know if this was not the case.
>
Thanks for your prompt reply.
I gave it a try, which did not work out.
The webpage says the option is -x|--include-identifiers; but the manpage from cmd line -H seems to tell it is -y, whereas the [-x <dynamic-options>] set a dynamic option, in the format ....
When I used --include-identifiers, error occurred saying Unknown flag 'include-identifiers'.
I need this option working badly!
goby 1g compact-to-fasta -i 1.compact-reads -o out.fasta --identifier-to-header
cat out.fasta
>!
ACTG