I have some experience with 454, but I’m new processing Illumina data. I’m trying to analyse MiSeq data starting from R1.fastq and R2.fastq files. I also have the i7 and i5 index sequences which I put in the map file under the column BarcodeSequence, stitched as a single barcode.
I used the following scripts, but at the end I have problem because it reports this error: /macqiime/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_stitched --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes
python merge_bcs_reads.py 1_extract_barcodes/barcodes.fastq 1_extract_barcodes/reads.fastq reads_with_barcodes_at_head.fastq
convert_fastaqual_fastq.py -c fastq_to_fastaqual -f reads_with_barcodes_at_head.fastq -o 2_fastaqual_converted
split_libraries.py -m mapEC.txt -f 2_fastaqual_converted/reads_with_barcodes_at_head.fna -q 2_fastaqual_converted/reads_with_barcodes_at_head.qual --barcode_type 16 -o 3_split_libraries
The output for split_library_log.txt says:
Number raw input seqs 441694
Length outside bounds of 200 and 1000 11351
Num ambiguous bases exceeds limit of 6 0
Missing Qual Score 0
Mean qual score below minimum of 25 33666
Max homopolymer run exceeds limit of 6 7783
Num mismatches in primer exceeds limit of 0: 388894
Sequence length details for all sequences passing quality filters:
No sequences passed quality filters for writing.
Barcodes corrected/not 0/0
Uncorrected barcodes will not be written to the output fasta file.
Corrected barcodes will be written with the appropriate barcode category.
Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.
Total valid barcodes that are not in mapping file 0
Sequences associated with valid barcodes that are not in the mapping file will not be written.
Barcodes in mapping file
Sample Sequence Count Barcode
KSMG 0 TGCAGCTACGTCTAAT
RPV 0 TCGACGTCCGTCTAAT
S59 0 TCGACGTCAAGGAGTA
FLC 0 TAGCGCTCCGTCTAAT
JSA 0 GGAGCTACCGTCTAAT
VMM 0 GCGTAGTACGTCTAAT
AISS 0 CGGAGCCTCGTCTAAT
S299 0 CGATCAGTCTAAGCCT
S40 0 CGATCAGTAAGGAGTA
SPP 0 CCTAAGACCGTCTAAT
RFG 0 ACTGAGCGCGTCTAAT
S73 0 ACTCGCTACTAAGCCT
Total number seqs written 0
I also tried this other way, but I got:
“Total number of input sequences: 33661
Barcode not in mapping file: 33661”
so I don’t know what is the problem with my barcodes?
extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes
join_paired_ends.py -f celiaca_R1.fastq -r celiaca_R1.fastq -b 1_extract_barcodes/barcodes.fastq -o 2_fastq-join_joined
split_libraries_fastq.py -i 2_fastq-join_joined/fastqjoin.join.fastq -b 2_fastq-join_joined/fastqjoin.join_barcodes.fastq -o 3_split_libraries_joined -m mapEC.txt -q 19 --barcode_type 16
Could you help me?
THANKS IN ADVANCE!
1. extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_stitched --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes
2. split_libraries_fastq.py -m mapEC.txt -b 1_extract_barcodes/barcodes.fastq -i 1_extract_barcodes/reads.fastq --barcode_type 16 -o split_lib_fastq
There may still be issues with getting the barcodes matched, e.g., they could be out of order/orientation relative to the mapping file barcodes, but we'll be able to tell if there is a barcode matching issue from the log file of split_libraries_fastq.py.
I did what you said, and none of the sequences matched with the barcodes in my map file (I send you attached the sequences).
I have both files, first they sent me a paired R1/R2 file for each sample and then, a single R1 and R2 file for all my data (celiaca_R1.fastq and celiaca_R2.fastq).
Would it be a better option to work from the individual R1 and R2 files of each sample?
How would the general pipeline be to work with individual files? Is there a script to merge all the individual R1 and R2 files?