Our collaborators have sent us a fastq file that contains the de-mulitiplexed, trimmed, cleaned assembled reads & reads_1 that didn't assemble. Currently, the fastq files for the barcodes are not attainable, so I made use of Tony's 'parse_bc_reads_labels.py' script with some modification (my header format was slightly different):
@MISEQ03:74:000000000-A2TYR:1:1101:14024:2827#TCCACAGGAGT/1
TACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGC
+
<???<BB?B5?<BBBBCCEEFFHFH?EHFFGHFHHHHHHHHHHHCCEC>EEEGHHACCFHFDHHCCDECDCDBBFFFBFFFFBDEDDDEBDEEEEEEEEE;BEE?ECBEEEC?C;;BEEEEEEEEEEAEEEEEEEEAECCEEEECEEEEE:AEEEE??E8A:CEAE
I successfully ran the following command :
split_libraries_fastq.py -i /illumina/JGI-PMI/iTAGs_2013Jan7_run1/standard.fastq.txt,/illumina/JGI-PMI/iTAGs_2013Jan23_run2/standard.fastq.txt -o /illumina/JGI-PMI/DRAFT_01/sl_out -b /illumina/JGI-PMI/iTAGs_2013Jan7_run1/standard.barcode.fastq.txt,/illumina/JGI-PMI/iTAGs_2013Jan23_run2/standard.barcode.fastq.txt -m /illumina/JGI-PMI/iTAGs_2013Jan7_run1/Mapping_run1.txt,/illumina/JGI-PMI/iTAGs_2013Jan23_run2/Mapping_run2.txt --barcode_type 11
But got troubling results:
Length Count
80.0 10
90.0 53
100.0 38
110.0 10
120.0 10
130.0 0
140.0 1
150.0 0
--
Length Count
126.0 26
136.0 77
146.0 427
156.0 15
166.0 1
176.0 0
186.0 0
196.0 0
206.0 1
216.0 2
226.0 1
236.0 3
246.0 0
--
and...
Input file paths
Mapping filepath: /illumina/JGI-PMI/iTAGs_2013Jan7_run1/Mapping_run1.txt (md5: 4e1049e0e796e99ec117cc8aed38d43a)
Sequence read filepath: /illumina/JGI-PMI/iTAGs_2013Jan7_run1/standard.fastq.txt (md5: 898d19dfe5eb7ad20eb3f3ae2b9b8587)
Barcode read filepath: /illumina/JGI-PMI/iTAGs_2013Jan7_run1/standard.barcode.fastq.txt (md5: 85fdec50cac6ca0c7d4b1b3b51
fbcb24)
Quality filter results
Total number of input sequences: 5484634
Barcode not in mapping file: 0
Read too short after quality truncation: 5484512
Count of N characters exceeds limit: 0
Illumina quality digit = 0: 0
Barcode errors exceed max: 0
Result summary (after quality filtering)
Median sequence length: 99.00
RHIZ.O.2.BESC.29 5
RHIZ.O.3.BESC.103 4
RHIZ.O.1.SQMC.25.5 4
.
.
Total number seqs written 122
---
.
.
.
Input file paths
Mapping filepath: /illumina/JGI-PMI/iTAGs_2013Jan23_run2/Mapping_run2.txt (md5: f0962541227653ba9200175d220f4540)
Sequence read filepath: /illumina/JGI-PMI/iTAGs_2013Jan23_run2/standard.fastq.txt (md5: 49f5833abe767369f51fb8bf2c50f271
)
Barcode read filepath: /illumina/JGI-PMI/iTAGs_2013Jan23_run2/standard.barcode.fastq.txt (md5: 5762abd8797820e39153f9199
d286a16)
Quality filter results
Total number of input sequences: 5903511
Barcode not in mapping file: 0
Read too short after quality truncation: 5902958
Count of N characters exceeds limit: 0
Illumina quality digit = 0: 0
Barcode errors exceed max: 0
Result summary (after quality filtering)
Median sequence length: 147.00
ENDO.C.2.HOMC.21.3 21
ENDO.C.2.BESC.29 21
ENDO.C.1.BESC.890 14
.
.
.
Total number seqs written 553
---
Not sure why all the data is being culled. Does this have to do with the potentially bad quality of the overlapping sections of the paired end reads? Any suggestions on what quality filtering settings I should change? Something I am missing? Another issue perhaps?
-Thanks!
-M