problem in splity library Illumina

33 views
Skip to first unread message

Leandro de Mattos

unread,
Apr 15, 2016, 6:43:36 PM4/15/16
to qiime...@googlegroups.com
Dear, 
I obtained two fastqc files (for and  rev illumina) consisting of different libraries with several different samples, I would like to extract these archive only samples of my study
Then performed the following steps:
1) fastq join (script of qiime) is merging for and rev all complementary reads.
2) convert to fastq - >> .fna (fasta) and  .qual files
3) splity library
In step 3,  I had the following error message below, but I already verified my mapping file (by qiime validate_mapping_file.py script) and I don't have any errors in the mapping file. 


Step 3) splity library: 

administrator@l1618479[step_2_fastaqual] /usr/lib/qiime/bin/split_libraries.py -m ./validate_mapp/mapp_corrected_corrected.txt -f fastqjoin.join.fna -q fastqjoin.join.qual -o split_library_output -b 12
Traceback (most recent call last):
  File "/usr/lib/qiime/bin/split_libraries.py", line 411, in <module>
    main()
  File "/usr/lib/qiime/bin/split_libraries.py", line 408, in main
    truncate_ambi_bases=opts.truncate_ambi_bases)
  File "/usr/lib/python2.7/dist-packages/qiime/split_libraries.py", line 1289, in preprocess
    barcode_type, added_demultiplex_field)
  File "/usr/lib/python2.7/dist-packages/qiime/split_libraries.py", line 310, in check_map
    'identify problems.')
ValueError: Errors were found with mapping file, please run validate_mapping_file.py to identify problems.

Back in the step 2)  convert to fastq - >> .fna (fasta) and  .qual files

Alternatively I thought about using filter_fasta.py script for  recovery my sequences by ID (because my fastq have the ID for each sample, exemple @ABC_1 ### In the illumina fastq header) in the file .fna (fasta)

However I have to make a script to also remove the qualities information of these same IDs (samples that interest me)  present in the general  file .qual (containing quality for all samples), so only the qualities of my samples (IDs). 
Is there a script in qiime that removes the qualities informations only of my sample by ID?

 Does anyone have any suggestions or alternative?
Thank you for your attention 

TonyWalters

unread,
Apr 15, 2016, 6:49:09 PM4/15/16
to Qiime 1 Forum
Leandro,

Instead of doing #2 and 3, why not specify the SampleID names with the --sample_id option using split_libraries_fastq.py? See the last example on this page: http://qiime.org/scripts/split_libraries_fastq.html
split_libraries_fastq.py is designed for use with Illumina data, whereas split_libraries.py is designed for 454 data.

You might also look at: http://qiime.org/scripts/multiple_split_libraries_fastq.html if you have a large number of samples. I'd suggest using the -w option first to make sure the --sample_ids look like what you want (remember that QIIME is expecting alphanumeric plus period characters only, so you might modify this part of the command in a text editor, before cutting and pasting the entire split_libraries_fastq.py command and running it).

Leandro de Mattos

unread,
Apr 15, 2016, 7:42:39 PM4/15/16
to Qiime 1 Forum
Dear Tony, I'm trying to do what I suggested, for while it is running. 
A doubt, I do not want to do quality filter in this case is only don't put the argument -q ?

/usr/lib/qiime/bin/split_libraries_fastq.py -i ./fastqjoin.join.fastq --sample_id APA -o de_multiplexed_APA/ -m mapp_corrected.txt --barcode_type 'not-barcoded' 

Soon I will give more news.
Thanks for listening.
Best,
Leandro

TonyWalters

unread,
Apr 15, 2016, 7:52:38 PM4/15/16
to Qiime 1 Forum
If you don't use -q, it will use the default value. There isn't a way to completely turn off quality filtering, but you could probably get around most of it by increasing the value of -n and -r, and reducing the value of -p

Also, you shouldn't need to pass the -m mapp_corrected.txt file above (it's going to name the reads under the SampleID APA).

TonyWalters

unread,
Apr 18, 2016, 5:25:32 PM4/18/16
to Qiime 1 Forum
I'm not seeing your latest response, but was the data concatenated before running join_paired_ends.py? If you have separate files from before that point, I would run them separately (multiple_join_paired_ends.py) and then run multiple_split_libraries_fastq.py 

Let me know if that's the situation you're working with, as there may be an added step to get rid of the unjoined data before running multiple_split_libraries_fastq.py, listed at the end.

If the data are already combined when you got them, but they have valid QIIME identifiers at the beginning of the fastq labels, you could try this script for splitting the fastq file:http://qiime.org/scripts/split_sequence_file_on_sample_ids.html



You could dump the unjoined data to another folder, and run the multiple_split_libraries_fastq.py command, which should only pick up the single file after this.
Here's an example Linux command to do so:
mkdir unjoined_file_dump
find input_dir/ -name "fastqjoin.un*" -print -exec mv {} unjoined_file_dump/ \;
where input_dir is the folder containing all of the subfolders with your joined reads.
Reply all
Reply to author
Forward
0 new messages