I am quite new to Bioinformatics, but I found the STAR aligner to be one of the fastest tools to use in this field.
When I use the SRA tools to download them with --split-3 option (./fastq-dump --
split-
3 SRR1164866), I get three fastq files. For example: sample 'SRR1164866' is a paired sample
http://www.ebi.ac.uk/ena/data/view/SRR1164866 It will be downloaded into three different fastq files : SRR1164866_1.fastq SRR1164866_2.fastq and SRR1164866.fasq.
'We distribute our fastq files for our paired end sequencing in 2 files, mate1 is found in a file labelled _1 and mate2 is found in the file labelled _2. The files which do not have a number in their name are singled ended reads, this can be for two reasons, some sequencing early in the project was singled ended also, as we filter our fastq files as described in our
README if one of a pair of reads gets rejected the other read gets placed in the single file.'
I am getting great alignment results against the _1.fastq and _2.fastq files, but what about the third .fastq file (which sometimes have a huge size (2-3 Gigabytes) almost the same size as each of the _1.fastq and _2.fastq).
Is there away for STAR to align all the data (_1.fastq ,_2fastq and .fastq) of my samples.