Hello,
It seems that when I pass from a small sample test to the real sample data problems arise, and although I see what the problem is, I don't know how to adapt the mapping file and the FASTA file for split_libraries.py and the rest of the workflow to work. The problem is that I have repeated barcodes in the mapping file. I'll describe all the tips that the QIIME communty has provided me to see if any of this tips can be subsituted by any other or added.
So the mapping file construction stars with sequences that look like this:
>MISEQ_0005_FC:1:1101:13831:1939#ACACCTC
TGGTCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTACCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGTTTCTCAAGTCCGATGCTAAAG
What follows "#" is my sample barcode, and of course, in the FASTA file many sequences share one sample barcode.
Following Tony's advise I have rebuilt the FASTA file in order to have the barcode at the beginning of the sequence, like this, (note that the space in the sequence is for clarity purposes)
>seq1
ACACCTC TGGTCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTACCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGTTTCTCAAGTCCGATGCTAAAG
With this seqeunce layout, the MAP file looks like this
#SampleID BarcodeSequence LinkerPrimerSequence Description
seq1 ACACCTC TGGTCGTGCCAGCMGCCGCGGTAA --
However, when I use a dataset with repeated barcodes, the MAP file reflects this, having the BarcodeSequence field with repeted strings, as in
#SampleID BarcodeSequence LinkerPrimerSequence Description
seq1 ACACCTC TGGTCGTGCCAGCMGCCGCGGTAA --
...
seq12 ACACCTC TGGTCGTGCCAGCMGCCGCGGTAA --
...
So far, check_id_map.py warns me about this repetitions but as far as I can see, the suggested corrected MAP file is exactly the same, with repetitions. split_libraries.py does not output results due to this problem with repetitions.
The reason to use split_libraries is two-fold:
1) I want a MAP file to be constructed to be used in the downstream workflow
2) I want to have my FASTA sequences trimmed of barcode + Linker
I could do this trimming manually, and enter the workflow directly into pick_reference_otus_through_otu_table.py, but then what the layout of the MAP file would be?
I hope there is a soulution for this barcode repetition problem,
Thanks
-Pau