Split libraries: "Mean of empty slice.", RuntimeWarning

334 views
Skip to first unread message

Esther Mejia

unread,
Apr 3, 2017, 7:37:34 PM4/3/17
to Qiime 1 Forum

Hi, 


I have some experience with 454, but I’m new processing Illumina data. I’m trying to analyse MiSeq data starting from R1.fastq and R2.fastq files. I also have the i7 and i5 index sequences which I put in the map file under the column BarcodeSequence, stitched as a single barcode.

I used the following scripts, but at the end I have problem because it reports this error: /macqiime/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.

  warnings.warn("Mean of empty slice.", RuntimeWarning)



extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_stitched --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes


python merge_bcs_reads.py 1_extract_barcodes/barcodes.fastq 1_extract_barcodes/reads.fastq reads_with_barcodes_at_head.fastq


convert_fastaqual_fastq.py -c fastq_to_fastaqual -f reads_with_barcodes_at_head.fastq -o 2_fastaqual_converted


split_libraries.py -m mapEC.txt -f 2_fastaqual_converted/reads_with_barcodes_at_head.fna -q 2_fastaqual_converted/reads_with_barcodes_at_head.qual --barcode_type 16 -o 3_split_libraries




The output for split_library_log.txt says:


Number raw input seqs 441694


Length outside bounds of 200 and 1000 11351

Num ambiguous bases exceeds limit of 6 0

Missing Qual Score 0

Mean qual score below minimum of 25 33666

Max homopolymer run exceeds limit of 6 7783

Num mismatches in primer exceeds limit of 0: 388894


Sequence length details for all sequences passing quality filters:

No sequences passed quality filters for writing.


Barcodes corrected/not 0/0

Uncorrected barcodes will not be written to the output fasta file.

Corrected barcodes will be written with the appropriate barcode category.

Corrected but unassigned sequences will not be written unless --retain_unassigned_reads is enabled.


Total valid barcodes that are not in mapping file 0

Sequences associated with valid barcodes that are not in the mapping file will not be written.


Barcodes in mapping file

Sample Sequence Count Barcode

KSMG 0 TGCAGCTACGTCTAAT

RPV 0 TCGACGTCCGTCTAAT

S59 0 TCGACGTCAAGGAGTA

FLC 0 TAGCGCTCCGTCTAAT

JSA 0 GGAGCTACCGTCTAAT

VMM 0 GCGTAGTACGTCTAAT

AISS 0 CGGAGCCTCGTCTAAT

S299 0 CGATCAGTCTAAGCCT

S40 0 CGATCAGTAAGGAGTA

SPP 0 CCTAAGACCGTCTAAT

RFG 0 ACTGAGCGCGTCTAAT

S73 0 ACTCGCTACTAAGCCT


Total number seqs written 0



___

I also tried this other way, but I got: 

“Total number of input sequences: 33661

Barcode not in mapping file: 33661”


so I don’t know what is the problem with my barcodes?



extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes


join_paired_ends.py -f celiaca_R1.fastq -r celiaca_R1.fastq -b 1_extract_barcodes/barcodes.fastq -o 2_fastq-join_joined


split_libraries_fastq.py -i 2_fastq-join_joined/fastqjoin.join.fastq -b 2_fastq-join_joined/fastqjoin.join_barcodes.fastq -o 3_split_libraries_joined -m mapEC.txt -q 19 --barcode_type 16




Could you help me?

THANKS IN ADVANCE!


Esther.

TonyWalters

unread,
Apr 5, 2017, 3:44:14 AM4/5/17
to Qiime 1 Forum
Hello Esther,

Since these are Illumina data, I'd avoid using the 454-based script (split_libraries.py) in favor of the Illumina-based script (split_libraries_fastq.py).

Can you try this approach instead (the filepaths are based upon your prior listed commands, you may have to modify these)?

1. extract_barcodes.py -f celiaca_R1.fastq -r celiaca_R2.fastq -c barcode_paired_stitched --bc1_len 8 --bc2_len 8 -o 1_extract_barcodes

2. split_libraries_fastq.py -m mapEC.txt -b 1_extract_barcodes/barcodes.fastq -i 1_extract_barcodes/reads.fastq --barcode_type 16 -o split_lib_fastq

There may still be issues with getting the barcodes matched, e.g., they could be out of order/orientation relative to the mapping file barcodes, but we'll be able to tell if there is a barcode matching issue from the log file of split_libraries_fastq.py.


-Tony

Esther Mejia

unread,
Apr 5, 2017, 2:19:53 PM4/5/17
to Qiime 1 Forum
Hi Tony, 
Thank you for answer my question, I tried the approach that you suggested above obtaining:

Error in split_libraries_fastq.py: No filepaths match pattern/name '1_extract_barcodes/barcodes.fastq -i 1_extract_barcodes/reads.fastq --barcode_type'. All patterns must be matched at least once. 

I guess it's because, as we extracted the barcodes from the sequences, the resulting reads do not match with the barcodes file. That is why before doing the spliting I use the script join_paired_ends.py -f celiaca_R1.fastq -r celiaca_R2.fastq -b 1_extract_barcodes/barcodes.fastq -o 2_fastq-join_joined resulting in:

Input file paths
Mapping filepath: mapEC.txt (md5: 5e66911cb706a75ce32aaf71d23e6ed4)
Sequence read filepath: 2_fastq-join_joined/fastqjoin.join.fastq (md5: 5c75511cc23076c90d55f4af4960d6c1)
Barcode read filepath: 2_fastq-join_joined/fastqjoin.join_barcodes.fastq (md5: 3bb6266c0153d39f9630cb9ed5a9b79e)

Quality filter results
Total number of input sequences: 306075
Barcode not in mapping file: 306075
Read too short after quality truncation: 0
Count of N characters exceeds limit: 0
Illumina quality digit = 0: 0
Barcode errors exceed max: 0

Result summary (after quality filtering)
Median sequence length: nan
VMM 0
SPP 0
S73 0
S59 0
S40 0
S299 0
RPV 0
RFG 0
KSMG 0
JSA 0
FLC 0
AISS 0
 
I send you attached the file with the indexes sequences (i7 & i5) that were sent me, and an example of how I placed then on my my map file.

Thanks again!
Esther.


i7 & i5 index sequences.txt
map_file_example.txt

TonyWalters

unread,
Apr 6, 2017, 4:05:13 AM4/6/17
to Qiime 1 Forum
Hello Esther, 

Now we have to determine why the barcodes do not match those that are given in your mapping file. This can be somewhat difficult, because the barcodes can be out of order/orientation. There is a custom script that might help, as it gets the most frequently occurring sequences from your barcodes fastq file. The most frequent reads from these data should match your barcodes, so you should be able to take the first 8 bases of some of the most frequently occurring sequences and search your mapping file for matches (and reverse complement the same bases, here's a handy tool: https://www.bioinformatics.org/sms/rev_comp.html, and see if those match to tell us if the reads are out of orientation).

Here is the custom script:
https://gist.github.com/walterst/7a1e8608b82c5ce580c9 
Here is the usage string:
Usage: python get_barcode_freqs.py bc_fastq_fp output_text_fp barcode_length
In your case, it would be a command like this (the paths may change depending upon where you run the file):
python get_barcode_freqs.py 1_extract_barcodes/barcodes.fastq barcodes_frequencies.txt 16

Just to confirm, did you get a single R1 and R2 fastq file for all of your data, or did you get a paired R1/R2 file for each of your samples?

-Tony

Esther Mejia

unread,
Apr 7, 2017, 2:33:02 PM4/7/17
to Qiime 1 Forum

Hello Tony,


I did what you said, and none of the sequences matched with the barcodes in my map file (I send you attached the sequences).


I have both files, first they sent me a paired R1/R2 file for each sample and then, a single R1 and R2 file for all my data (celiaca_R1.fastq and celiaca_R2.fastq). 


Would it be a better option to work from the individual R1 and R2 files of each sample?

How would the general pipeline be to work with individual files? Is there a script to merge all the individual R1 and R2 files?


Thank you!

Esther.

frequent barcodes.txt

Esther Mejia

unread,
Apr 7, 2017, 2:34:30 PM4/7/17
to Qiime 1 Forum
frequent barcodes.txt

TonyWalters

unread,
Apr 7, 2017, 2:37:28 PM4/7/17
to Qiime 1 Forum
Hello Esther,

Yes, I would work with the per-sample paired fastq files (the barcodes are seemingly missing from your concatenated data, since they don't match the barcodes you used).

If you want to stitch the paired reads, I would use this script: http://qiime.org/scripts/multiple_join_paired_ends.html


Then you can quality filter and demultiplex the data with http://qiime.org/scripts/multiple_split_libraries_fastq.html

Sometimes it takes some tweaking of parameters and clean-up of intermediate files to make these work, but I think it's more viable than trying to figure out what happened to the barcodes.

-Tony

Esther Mejia

unread,
Apr 7, 2017, 2:40:22 PM4/7/17
to Qiime 1 Forum
Ok Tony, 
Thank you very much, I'll try it and let you know! :)

Esther.
Reply all
Reply to author
Forward
0 new messages